Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] SSL Handshake fails on Zscaler #1192

Open
vpacik opened this issue Apr 17, 2024 · 1 comment
Open

[BUG] SSL Handshake fails on Zscaler #1192

vpacik opened this issue Apr 17, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@vpacik
Copy link

vpacik commented Apr 17, 2024

Describe the bug
SSL Handshake fails when trying to run pyspark code via Databricks-connect locally on the machine with corporate VPN (Zscaler) running WSL2.

To Reproduce
Steps to reproduce the behavior:

  1. Create simple py script invoking spark from DatabricksSession:
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
spark.range(10).show()
  1. Click on 'Run Python File'
  2. See error:
E0417` 10:14:56.342496763  337339 ssl_transport_security.cc:1519]       Handshake failed with fatal error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED.
Traceback (most recent call last):
  File "/home/vpacik/Codes/db-connect-test/spark-test.py", line 5, in <module>
    spark.range(10).show()
  File "/home/vpacik/Codes/.venv/lib/python3.10/site-packages/pyspark/sql/connect/dataframe.py", line 996, in show
    print(self._show_string(n, truncate, vertical))
  File "/home/vpacik/Codes/.venv/lib/python3.10/site-packages/pyspark/sql/connect/dataframe.py", line 753, in _show_string
    ).toPandas()
  File "/home/vpacik/Codes/.venv/lib/python3.10/site-packages/pyspark/sql/connect/dataframe.py", line 1655, in toPandas
    return self._session.client.to_pandas(query)
  File "/home/vpacik/Codes/.venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", line 798, in to_pandas
    table, schema, metrics, observed_metrics, _ = self._execute_and_fetch(req)
  File "/home/vpacik/Codes/.venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", line 1172, in _execute_and_fetch
    for response in self._execute_and_fetch_as_iterator(req):
  File "/home/vpacik/Codes/.venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", line 1153, in _execute_and_fetch_as_iterator
    self._handle_error(error)
  File "/home/vpacik/Codes/.venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", line 1308, in _handle_error
    self._handle_rpc_error(error)
  File "/home/vpacik/Codes/.venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py", line 1348, in _handle_rpc_error
    raise SparkConnectGrpcException(str(rpc_error)) from None
pyspark.errors.exceptions.connect.SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:20.42.4.211:443: Ssl handshake failed: SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_message:"failed to connect to all addresses; last error: UNKNOWN: ipv4:20.42.4.211:443: Ssl handshake failed: SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED", grpc_status:14, created_time:"2024-04-17T10:14:56.343025663+02:00"}"

System information:
Version: 1.88.1 (user setup)
Commit: e170252f762678dec6ca2cc69aba1570769a5d39
Date: 2024-04-10T17:41:02.734Z
Electron: 28.2.8
ElectronBuildId: 27744544
Chromium: 120.0.6099.291
Node.js: 18.18.2
V8: 12.0.267.19-electron.0
OS: Windows_NT x64 10.0.22621

Additional context
We are using WSL2 on the machine with corporate VPN (Zscaler) with the exported root CA used.
Pinging the domain with this certificate via openssl works fine (like openssl s_client -connect {servername}:443)
Databricks CLI is working fine on the same machine.
File synchronization via Databricks-Connect also works as expected.
EDIT: Authentication is done via PAT from DBX.

@vpacik vpacik added the bug Something isn't working label Apr 17, 2024
@vpacik vpacik changed the title [BUG] SSL Handshake fails on VPN [BUG] SSL Handshake fails on ZPA Apr 23, 2024
@vpacik vpacik changed the title [BUG] SSL Handshake fails on ZPA [BUG] SSL Handshake fails on Zscaler Apr 23, 2024
@stevenayers-bge
Copy link

@vpacik I've had the same issue, it's because the spark connect client uses GRPC not HTTP, so to resolve the SSL error you need to set GRPC_DEFAULT_SSL_ROOTS_FILE_PATH

You may run into more issues though, details here: https://community.databricks.com/t5/administration-architecture/proxy-zscaler-amp-databricks-spark-connect-quot-cannot-check/m-p/94737#M2115

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants