Skip to main content

Send Spark cluster details

There are helpful Spark cluster configurations and a connectivity check that you can send to us via the Prophecy Support Portal for troubleshooting.

Spark configurations

Two ways to access the configurations:

  • Browsing the Spark UI
  • Running a notebook

Configurations in the UI

You can access your Spark cluster configurations directly from the Spark UI.

note

Please send screenshots of each configuration if possible.

Configuration to SendExample
Overall cluster configuration (e.g., Spark version, Databricks runtime version, UC single or UC shared)
Cluster configuration example
Cluster JSON (edited to remove any private or sensitive information)
Cluster JSON example
Libraries installed on the cluster
Cluster libraries example
Init scripts run on the cluster. Include the script itself if possible.
Cluster init scripts example
Output of attaching cluster in a notebook. You may need to duplicate the tab and try attaching the same cluster in the duplicate tab.
Notebook attach to cluster example

Run a notebook

For those who prefer to use code, create a notebook (example below) and send the output via the Prophecy Support Portal.

info

Replace the workspace URL, personal access token, clusterID, and API token as appropriate.

Python

# Databricks notebook source
import requests

#Get Databricks runtime of cluster
# Get the notebook context using dbutils
context = dbutils.notebook.entry_point.getDbutils().notebook().getContext()

# Retrieve the Databricks runtime version from the context tags
runtime_version = context.tags().get("sparkVersion").get()

# Print the runtime version
print(f"Databricks Runtime Version: {runtime_version}")

# Get Spark version
spark_version = spark.version
print(f"Spark Version: {spark_version}")


#Get the installed libraries and access mode details of the cluster
# Replace with your Databricks workspace URL and token
workspace_url = "replace_with_workspace_url"
token = "replace_with_token"
cluster_id = "replace_with_cluster_id"


# API endpoint to get info of installed libraries
url = f"{workspace_url}/api/2.0/libraries/cluster-status"

# Make the API request
response = requests.get(url, headers={"Authorization": f"Bearer {token}"}, params={"cluster_id": cluster_id})

library_info=response.json()
print("Libraries:")
for i in library_info['library_statuses']:
print(i)

# API endpoint to get access mode details
url = f"{workspace_url}/api/2.1/clusters/get"

# Make the API request
response = requests.get(url, headers={"Authorization": f"Bearer {token}"}, params={"cluster_id": cluster_id})

cluster_access_info=response.json()
print(f"Cluster Access Mode: {cluster_access_info['data_security_mode']}")

Connectivity Check

Open a notebook on the Spark cluster and run the following command.

info

Replace the Prophecy endpoint.

import subprocess

command = 'curl -X GET "https://customer_prophecy_url/execution"'
output = subprocess.check_output(['/bin/bash', '-c', command], text=True)

print(output)

This command tests the reverse websocket protocol required by Prophecy to execute Pipelines on Spark clusters. Please send the output from this command in the Support Portal.

We look forward to hearing from you!