Fabrics
Prophecy helps you develop data pipelines in high-quality Spark or SQL code—but what does Prophecy use to compute these pipelines? The first thing you need to understand before building any pipeline is that your pipeline must be connected to an execution environment.
This is why Fabrics exist in Prophecy. Fabrics let Prophecy connect to specific execution environments.
Prophecy provides a Prophecy-managed Fabric that can get you started with building your pipelines. However, you can also create your own Fabrics to connect to other execution environments, such as a Databricks workspace.
Example
Here is one way you might set up your Fabrics. First, the Admin creates:
- A team named Marketing_DSS for the Marketing Decision Support System users.
- A
dev
Fabric for development activities that specifies the Marketing_DSS team. - A
prod
Fabric for production pipelines that specifies the Marketing_DSS team.
In this example, all users in the Marketing_DSS Team will have access to the dev
and prod
Fabrics.
Components
Fabrics include everything required to run a data Pipeline.
Spark Environment
A Spark Environment is a named environment owned by one team but can be shared among multiple teams. It includes the following components:
- Connection Credentials: For Databricks, these include the Workspace URL and the Access Token.
- Cluster Configuration: For Databricks, this specifies the Databricks Runtime Version, Machine Type, and Idle Timeout.
- Job Sizes: Prophecy allows you to define commonly used cluster sizes and assign them names for easy reference. For example, an "XL" cluster might consist of 10 servers using the
i3.xlarge
instance type, providing 40 CPUs and 70GB of memory.
Scheduler
The Scheduler executes one or more Spark data pipelines on a defined schedule, such as every weekday at 9 a.m. Databricks workspaces include a default scheduler that is always available. For enterprise environments, an Airflow Scheduler option is also provided.
Database Connections
Data Pipelines often require connections to operational databases, such as MySQL or Postgres, or to data warehouses, such as Snowflake. These connections, using JDBC or other protocols, can be securely stored on the Fabric for convenient reuse.
Metadata Connection
Optionally, you can enhance your Fabric by creating a Metadata Connection. This is especially useful for users managing hundreds or thousands of tables in their data providers. For more details, see the Metadata Connections documentation.
Credentials and Secrets
Prophecy enables you to securely store credentials in the Databricks environment. When connecting to Databricks, you can either use a Personal Access Token (PAT) or leverage Databricks OAuth.
Key-value pairs can be stored as secrets, which are accessible to running workflows. After a secret is created, it can only be read by running jobs, and Prophecy does not have access to its value.
Hands on
Get started with hands-on guides. Learn step by step how to connect to your execution engine by creating a Fabric:
- Create a SQL Fabric with a JDBC or Unity Catalog connection following this guide.
- Create a Databricks Fabric following these steps.
- Create an EMR Fabric with Livy step by step here.
What's next
To learn more about Fabrics and the Prophecy Libraries, see the following page:
📄️ Metadata Connections
sync catalogs, tables, schemas, etc into Prophecy's Project viewer