Prophecy deployment is simple and flexible. Prophecy is written as a set of microservices that run on Kubernetes and is built to be multi-tenant. There are three primary options
Prophecy in the cloud connects to your existing Spark and Scheduler/Orchestrator. Prophecy does not store any data, however, it does store metadata about your pipelines, datasets and schedules.
Public SaaS (Prophecy managed SaaS) is the default option when you connect from Databricks Partner Connect and is free for one user. This option is heavily used by customers to try Prophecy. Our startup and midsize customers who like the convenience of a managed service prefer this option. You can also use this by directly going to the Prophecy Application
Private SaaS (Customer VPC)
Our Enterprise customers and midsize/startup customers in segments which deal with very sensitive data primarily use this option. Here, Prophecy runs within the Customer VPC and connects to the identity, spark clusters and the scheduler within the VPC
This is the default option when you go through the cloud marketplaces. You can install the software from the Azure Marketplace . The install is very simple, takes about 20 minutes, and billing starts after 30 days (and a confirmation popup)
On rare occasions Prophecy will deploy on-premise for the large customers who are moving to the cloud. Often the order is that the organizations will move pipelines from on-premise legacy ETL tools to Spark, then move it to Spark on the cloud. For more information read the on-premise installation documentation or reach out to our team by using request a demo.
Prophecy connects to the following external services:
- Spark - for interactive code execution
- Schedulers - for code orchestration
- Git - for code storage
- Identity Providers - for easier user authentication and authorization
Prophecy connects to Databricks using Rest API. Each fabric defined in Prophecy refers to a single Databricks workspace and each user is required to provide a personal access token to authenticate to it.
Security-conscious enterprises that use Databricks with limited network access have
to additionally whitelist
the Prophecy Data Plane IP address (
Primarily, Prophecy uses Databricks, for the following functionalities:
- Interactive Execution - Prophecy allows its users to spin up new clusters or connect to existing clusters. When a cluster connection exists, Prophecy allows the user to run their code in the interactive mode. Interactive code queries are sent to Databricks using the Databricks Command API 1.2.
- Scheduling - Prophecy allows the user to build and orchestrate Databricks Jobs. This works through the Databricks Jobs API 2.1.
By default, Prophecy does not store any data samples when executing code using Databricks. Data samples can be optionally stored for observability purposes (execution metrics).
When using Active Directory, Prophecy takes care of auto-generation and refreshing of the Databricks personal access tokens. Read more about it here.
Supported Git providers:
- Prophecy Managed - Prophecy automatically setups the connectivity between itself and the repositories. Prophecy Managed is based on open-source GitTea.
- GitHub (including GitHub Enterprise) - authenticates using per-user personal access tokens. How to generate PAT?
- Bitbucket (including Bitbucket self-hosted) - authenticates using per-user personal access tokens. How to generate PAT?
- GitLab (including GitLab self-hosted) - authenticates using per-user personal access tokens. How to generate PAT?
- Azure DevOps - authenticates using per-user personal access tokens. How to generate PAT?
The users are going to be able to connect to common Git providers, by leveraging their respective OAuth functionalities. E.g. GitHub OAuth or Azure AD.