Fabrics
Fabric is a logical execution environment. Teams organize their data engineering into multiple environments such as development, staging, and production.
Common Usage Pattern
- Admin sets up a Prophecy account and creates a
dev
Fabric for development, andprod
Fabric for production - Admin adds a team for Marketing Decision Support System (
Marketing_DSS
) - Users in the
Marketing_DSS
Team have access todev
Fabric for developmentprod
Fabric for production Pipelines
What's in a Fabric
Fabric includes everything required to run a data Pipeline
- Spark Environment
- This is a named Spark environment, owned by one team and used by one or more teams
- It contains
- Connection Credentials - for Databricks this includes the Workspace URL and the Access Token
- Cluster Configuration - for Databricks this includes the Databricks Runtime Version, Machine Type, and Idle Timeout
- Job Sizes - for convenience Prophecy enables you to create commonly used cluster sizes and name them. For example an XL cluster might be 10 servers of
i3.xlarge
instance type that will have 40 CPUs and 70GB memory
- Scheduler
- Scheduler runs one more Spark data Pipelines based on a schedule (e.g. every weekday at 9am)
- Databricks workspaces come with a default scheduler that is always available
- Enterprise environments have the Airflow Scheduler option as well
- Database Connections
- Data Pipelines will often read operational databases such as MySql or Postgres, and read/write Data Warehouses such as Snowflake
- JDBC or other connections to these databases can be stored on the Fabric
- Credentials and Secrets
- Prophecy enables you to store credentials safely in the Databricks environment. You can store key-value pairs as secrets that are made available for reading to the running workflows.
- Please note that after a secret is created it is only readable by a running Job. Prophecy does not have access to this value.