Skip to main content

Databricks

Create a Databricks Fabric to connect Prophecy to your existing Databricks environment. Think of a Fabric as connection to your Databricks workspace. This Fabric enables Prophecy to connect to existing Spark clusters (or create new ones), execute Spark pipelines, read and write data, etc - all according to each user's permissions defined by their personal access token.

Please refer to the video below for a step-by-step example.

  • Databricks Credentials - Here you will provide your Databricks Workspace URL and Personal Access Token (PAT). The PAT must have permission to attach clusters. If you'd like to create clusters or read/write data from Prophecy, then these permissions should be enabled for the PAT as well. Keep in mind each user will need to use their own PAT in the Fabric. Prophecy respects the permissions scoped to each user.
  • Cluster Details - Here you would need to provide the Databricks Runtime version, Executor and Drive Machine Types and Termination Timeout if any. These cluster details will be used when creating a cluster via Prophecy during Interactive development and for job clusters during Scheduled Databricks Job runs.
  • Job sizes - User can create Job sizes here using which clusters can be spawned while testing through prophecy IDE. Here you can provide Cluster mode, Databricks Runtime version, total number of the Executors, Core and Memory for them, etc. This provides all the options which are available on Databricks while spawning clusters through Databricks. We recommend using the smallest machines and smallest number of nodes appropriate for your use case.

Editing a Job

In Json you can just copy-paste your compute Json from Databricks.

  • Prophecy Library - These are some Scala and Python libraries written by Prophecy to provide additional functionalities on top of Spark. These would get automatically installed in your Spark execution environment when you attach to a cluster/create new cluster. These libraries are also publicly available on Maven central and Pypi respectively.
  • Metadata Connection - Optionally, enhance your Fabric by creating a Metadata Connection, recommended for users with hundreds or thousands of tables housed in their data provider(s).
  • Artifacts - Prophecy supports Databricks Volumes. When you run a Python or Scala Pipeline via a Job, you must bundle them as whl/jar artifacts. These artifacts must then be made accessible to the Databricks Job in order to use them as a library installed on the cluster. You can designate a path to a Volume for uploading the whl/jar files under Artifacts.

Databricks Execution

Execution on Databricks

Interactive Execution

Execution Metrics