Skip to main content

Databricks serverless compute for PySpark

Databricks serverless compute allows you to run workloads without manually provisioning a Spark cluster. With serverless compute, Databricks takes care of the infrastructure in the background, so your jobs start up quickly and scale as needed. Prophecy supports serverless compute for running pipelines in PySpark projects on Databricks.

This page explains how to use serverless compute with Prophecy, including supported data sources, data sampling modes, and current limitations.

info

Databricks serverless compute differs from serverless SQL warehouses. Prophecy uses serverless compute to run Spark pipelines on Spark fabrics. In contrast, serverless SQL warehouses are connected to Prophecy via JDBC and are used to run SQL queries generated from pipelines in SQL projects.

Prerequisites

To use serverless compute in Prophecy, you need:

Supported data sources

You can run the following sources on Databricks serverless compute:

Supported data sampling modes

You can use the following data sampling modes when using Databricks serverless compute:

Limitations

Below are the current limitations of Databricks Serverless and how they impact Prophecy project development.

FeatureLimitation
Scala supportDatabricks serverless only supports Python and SQL.
Scala projects cannot run on Databricks Serverless.
DependenciesOnly Python dependencies are supported.
Dependencies must be added through the Prophecy UI.
You cannot install dependencies to serverless compute directly in Databricks.
Row sizeMaximum row size is 128MB.
Driver sizeDatabricks serverless driver size is unknown and cannot be changed.
Supported data formatsXLSX, fixed format, and custom formats are not supported.
UDF network accessUDFs cannot access the internet.
Spark configurationDatabricks Serverless only supports a limited number of Spark configuration properties.
APIs in Script gemsSpark Connect APIs are supported.
Spark RDD APIs are not supported.
DataFrame and SQL cache APIs are not supported.
note

For the complete list of limitations, visit Serverless compute limitations in the Databricks documentation.