Skip to main content

Fabrics

Prophecy helps you develop data pipelines in high-quality Spark or SQL code—but what does Prophecy use to compute these pipelines? The first thing you need to understand before building any pipeline is that your pipeline must be connected to an execution environment.

This is why Fabrics exist in Prophecy. Fabrics let Prophecy connect to specific execution environments.

Prophecy provides a Prophecy-managed Fabric that can get you started with building your pipelines. However, you can also create your own Fabrics to connect to other execution environments, such as a Databricks workspace.

Example

Here is one way you might set up your Fabrics. First, the Admin creates:

  • A team named Marketing_DSS for the Marketing Decision Support System users.
  • A dev Fabric for development activities that specifies the Marketing_DSS team.
  • A prod Fabric for production pipelines that specifies the Marketing_DSS team.

In this example, all users in the Marketing_DSS Team will have access to the dev and prod Fabrics.

Components

Fabrics include everything required to run a data Pipeline.

Data Pipeline

Spark Environment

A Spark Environment is a named environment owned by one team but can be shared among multiple teams. It includes the following components:

  • Connection Credentials: For Databricks, these include the Workspace URL and the Access Token.
  • Cluster Configuration: For Databricks, this specifies the Databricks Runtime Version, Machine Type, and Idle Timeout.
  • Job Sizes: Prophecy allows you to define commonly used cluster sizes and assign them names for easy reference. For example, an "XL" cluster might consist of 10 servers using the i3.xlarge instance type, providing 40 CPUs and 70GB of memory.

Scheduler

The Scheduler executes one or more Spark data pipelines on a defined schedule, such as every weekday at 9 a.m. Databricks workspaces include a default scheduler that is always available. For enterprise environments, an Airflow Scheduler option is also provided.

Database Connections

Data Pipelines often require connections to operational databases, such as MySQL or Postgres, or to data warehouses, such as Snowflake. These connections, using JDBC or other protocols, can be securely stored on the Fabric for convenient reuse.

Metadata Connection

Optionally, you can enhance your Fabric by creating a Metadata Connection. This is especially useful for users managing hundreds or thousands of tables in their data providers. For more details, see the Metadata Connections documentation.

Credentials and Secrets

Prophecy enables you to securely store credentials in the Databricks environment. When connecting to Databricks, you can either use a Personal Access Token (PAT) or leverage Databricks OAuth.

Key-value pairs can be stored as secrets, which are accessible to running workflows. After a secret is created, it can only be read by running jobs, and Prophecy does not have access to its value.

Hands on

Get started with hands-on guides. Learn step by step how to connect to your execution engine by creating a Fabric:

  1. Create a SQL Fabric with a JDBC or Unity Catalog connection following this guide.
  2. Create a Databricks Fabric following these steps.
  3. Create an EMR Fabric with Livy step by step here.

What's next

To learn more about Fabrics and the Prophecy Libraries, see the following page: