Prophecy deployment is simple and flexible. Prophecy is written as a set of microservices that run on Kubernetes and is built to be multi-tenant. There are three primary options:
Prophecy in the cloud connects to your existing Spark and Scheduler/Orchestrator. Prophecy does not store any data, however, it does store metadata about your Pipelines, Datasets and schedules.
Public SaaS (Prophecy managed SaaS) is the default option when you connect from Databricks Partner Connect and is free for one user. This option is heavily used by customers to try Prophecy. Our startup and midsize customers who like the convenience of a managed service prefer this option. You can also use this by directly going to the Prophecy Application.
Private SaaS (Customer VPC)
Customers in segments that deal with very sensitive data primarily use this option. Here, Prophecy runs within the Customer VPC and connects to the identity, Spark clusters and the scheduler within the VPC.
This is the default option when you go through the cloud marketplaces. You can install the software from the Azure Marketplace. The installation is very simple, takes about 20 minutes (with a confirmation popup), and billing starts after 30 days.
On rare occasions, Prophecy will deploy on-premise for large customers who are moving to the cloud. Often the order is that the organizations will move Pipelines from on-premise legacy ETL tools to Spark, then move it to Spark on the cloud. For more information read the on-premise installation documentation or reach out to our team by using request a demo.
Prophecy connects to the following external services:
- Spark - for interactive code execution
- Schedulers - for code orchestration
- Git - for code storage
- Identity Providers - for easier user authentication and authorization
Prophecy connects to Databricks using Rest API. Each Fabric defined in Prophecy refers to a single Databricks workspace and each user is required to provide a personal access token to authenticate to it.
Security-conscious enterprises that use Databricks with limited network access have to additionally add the Prophecy Data Plane IP address (
220.127.116.11) to the Databricks allowed access list.
Primarily Prophecy uses Databricks for the following functionalities:
- Interactive Execution - Prophecy allows its users to spin up new clusters or connect to existing clusters. When a cluster connection exists, Prophecy allows the user to run their code in the interactive mode. Interactive code queries are sent to Databricks using the Databricks Command API 1.2.
- Scheduling - Prophecy allows the user to build and orchestrate Databricks Jobs. This works through the Databricks Jobs API 2.1.
By default, Prophecy does not store any data samples when executing code using Databricks. Data samples can be optionally stored for observability purposes (execution metrics).
When using Active Directory, Prophecy takes care of auto-generation and refreshing of the Databricks personal access tokens. Read more about it here.
While all code generated by Prophecy is stored in a User’s Git repository, we temporarily store some of the generated code used during Interactive development in an encrypted cache.
Supported Git providers:
- Prophecy Managed - Prophecy automatically sets up the connectivity between itself and the repositories. Prophecy Managed is based on open-source GitTea.
- GitHub (including GitHub Enterprise) - authenticates using per-user personal access tokens. How to generate PAT?
- Bitbucket (including Bitbucket self-hosted) - authenticates using per-user personal access tokens. How to generate PAT?
- GitLab (including GitLab self-hosted) - authenticates using per-user personal access tokens. How to generate PAT?
- Azure DevOps - authenticates using per-user personal access tokens. How to generate PAT?
Security-conscious enterprises that use Git Providers within private networks behind firewalls have to add the Prophecy Control Plane IP address (
18.104.22.168) to the private network allow-list or to the Git provider allow-list.
Coming Soon Users will be able to connect to common Git providers, by leveraging their respective OAuth functionalities. E.g. GitHub OAuth or Azure AD.