Databricks serverless compute for PySpark
Databricks serverless compute allows you to run workloads without manually provisioning a Spark cluster. With serverless compute, Databricks takes care of the infrastructure in the background, so your jobs start up quickly and scale as needed. Prophecy supports serverless compute for running pipelines in PySpark projects on Databricks.
This page explains how to use serverless compute with Prophecy, including supported data sources, data sampling modes, and current limitations.
Databricks serverless compute differs from serverless SQL warehouses. Prophecy uses serverless compute to run Spark pipelines on Spark fabrics. In contrast, serverless SQL warehouses are connected to Prophecy via JDBC and are used to run SQL queries generated from pipelines in SQL projects.
Prerequisites
To use serverless compute in Prophecy, you need:
- Access to serverless compute in Databricks
- PySpark projects in Prophecy (Scala not supported)
Supported data sources
You can run the following sources on Databricks serverless compute:
- Avro
- CSV
- Data Generator
- Delta file
- JSON
- Kafka
- ORC
- Parquet
- Seed files
- Unity Catalog tables
- Unity Catalog volumes
- XML
Supported data sampling modes
You can use the following data sampling modes when using Databricks serverless compute:
Limitations
Below are the current limitations of Databricks Serverless and how they impact Prophecy project development.
Feature | Limitation |
---|---|
Scala support | Databricks serverless only supports Python and SQL. Scala projects cannot run on Databricks Serverless. |
Dependencies | Only Python dependencies are supported. Dependencies must be added through the Prophecy UI. You cannot install dependencies to serverless compute directly in Databricks. |
Row size | Maximum row size is 128MB. |
Driver size | Databricks serverless driver size is unknown and cannot be changed. |
Supported data formats | XLSX, fixed format, and custom formats are not supported. |
UDF network access | UDFs cannot access the internet. |
Spark configuration | Databricks Serverless only supports a limited number of Spark configuration properties. |
APIs in Script gems | Spark Connect APIs are supported. Spark RDD APIs are not supported. DataFrame and SQL cache APIs are not supported. |
For the complete list of limitations, visit Serverless compute limitations in the Databricks documentation.