Skip to main content

Prophecy Build Tool

Prophecy-built-tool (PBT) allows you to quickly build, test and deploy projects generated by Prophecy (your standard Spark Scala and PySpark Pipelines) to integrate with your own CI / CD (e.g. Github Actions), build system (e.g. Jenkins), and orchestration (e.g. Databricks Workflows).

Features (v1.0.1)

  • Build and unit test all Pipelines in Prophecy projects (Scala and Python)
  • Deploy Jobs with built Pipelines on Databricks
  • Integrate with CI/CD tools like Github Actions
  • Verify the project structure of Prophecy projects

Requirements

  • Python >=3.6
  • pip

Installation

To install PBT, simply run:

pip3 install prophecy-build-tool

Quickstart

Usage

Usage: pbt [OPTIONS] COMMAND [ARGS]...

Options:
--help Show this message and exit.

Commands:
build
deploy
test

Running locally

The PBT cli can be used to build, test and deploy projects created by Prophecy that are present in your local filesystem.

Please make sure the DATABRICKS_URL and DATABRICKS_TOKEN environment variables are set appropriately pointing to your Databricks workspace before running any PBT commands. Example:

export DATABRICKS_HOST="https://example_databricks_host.cloud.databricks.com"
export DATABRICKS_TOKEN="exampledatabrickstoken"

Building Pipelines and deploying Jobs

pbt deploy --path /path/to/your/prophecy_project/

Sample output:

Prophecy-build-tool v1.0.1

Found 1 jobs: daily
Found 1 pipelines: customers_orders (python)

Building 1 pipelines 🚰

Building pipeline pipelines/customers_orders [1/1]

✅ Build complete!

Deploying 1 jobs

Deploying job jobs/daily [1/1]
Uploading customers_orders-1.0-py3-none-any.whl to
dbfs:/FileStore/prophecy/artifacts/...
Querying existing jobs to find current job: Offset: 0, Pagesize: 25
Updating an existing job: daily

✅ Deployment completed successfully!

Running all unit tests in project

Running unit tests requires FABRIC_NAME environment variable to be set. This will be used to pick the correct configuration for running the unit tests. Example:

export FABRIC_NAME="dev"

To run all unit tests present in the project, use the test command as follows:

pbt test --path /path/to/your/prophecy_project/

Sample output:

Prophecy-build-tool v1.0.1

Found 1 jobs: daily
Found 1 pipelines: customers_orders (python)

Unit Testing pipeline pipelines/customers_orders [1/1]

============================= test session starts ==============================
platform darwin -- Python 3.8.9, pytest-7.1.2, pluggy-1.0.0 -- /Library/Developer/CommandLineTools/usr/bin/python
cachedir: .pytest_cache
metadata: None
rootdir: /path/to/your/prophecy_project/pipelines/customers_orders/code
plugins: html-3.1.1, metadata-2.0.2
collecting ... collected 1 item

test/TestSuite.py::CleanupTest::test_unit_test_0 PASSED [100%]

============================== 1 passed in 17.42s ==============================

✅ Unit test for pipeline: pipelines/customers_orders succeeded.

Integrating with Github Actions

PBT can be integrated with your own CI/CD solution to build, test and deploy Prophecy code. The steps for setting up PBT with Github Actions on your repository containing a Prophecy project is mentioned below.

Pre-requisite

  • A Prophecy project that is currently hosted in a Github repository

Setting up environment variables and secrets

PBT requires environment variables DATABRICKS_URL, DATABRICKS_TOKEN and FABRIC_NAME to be set for complete functionality. Setting DATABRICKS_TOKEN as a secret in Github The DATABRICKS_TOKEN that needs to be used can be set as a secret inside the Github repository of the project. Steps:

  • Go to Settings > Secrets > Actions from the repository menu
  • Click ‘New Repository secret’
  • Add the secret with name DATABRICKS_TOKEN and value of the Databricks token to be used by PBT.

Screenshot after setting DATABRICKS_TOKEN secret: Github Actions Secret addition

The environment variables can now be all set within the Github actions YML file as follows:

env:
DATABRICKS_HOST: "https://sample_databricks_url.cloud.databricks.com"
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
FABRIC_NAME: "dev"

The complete YML file definition is discussed in the next section.

Setting up a Github Actions Workflow on every push to main branch

We’re now ready to setup CI/CD on the Prophecy project. To setup a workflow to build, run all unit tests and then deploy the built jar (Scala)/ whl (Python) on Databricks on every push to the main automatically:

  • Create a .YML file in the project repository at the below location (relative to root)
.github/workflows/exampleWorkflow.yml
  • Add the below contents to exampleWorkflow.yml
name: Example CI/CD with Github actions

on:
push:
branches:
- "main"

env:
DATABRICKS_HOST: "https://sample_databricks_url.cloud.databricks.com"
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
FABRIC_NAME: "dev"

jobs:
build:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
- name: Set up JDK 11
uses: actions/setup-java@v3
with:
java-version: "11"
distribution: "adopt"
- name: Set up Python 3.x
uses: actions/setup-python@v4
with:
python-version: "3.x"
# Install all python dependencies
# prophecy-libs not included here because prophecy-build-tool takes care of it by reading each pipeline's setup.py
- name: Install dependencies
run: |
python3 -m pip install --upgrade pip
pip3 install build pytest wheel pytest-html pyspark prophecy-build-tool
- name: Run PBT build
run: pbt build --path .
- name: Run PBT test
run: pbt test --path .
- name: Run PBT deploy
run: pbt deploy --path .

The above workflow does the following in order:

  1. Triggers on every change that is pushed to the branch ‘main’
  2. Sets the environment variables required for PBT to run: DATABRICKS_HOST, DATABRICKS_TOKEN and FABRIC_NAME
  3. Sets up JDK 11, Python 3 and other dependencies required for PBT to run
  4. Builds all the Pipelines present in the project and generates a .jar/.whl file. If the build fails at any point a non-zero exit code is returned which stops the workflow from proceeding further and the workflow run is marked as a failure.
  5. Runs all the unit tests present in the project using FABRIC_NAME as the configuration. If any of the unit test fails a non-zero exit code is returned which stops the workflow from proceeding further and the workflow run is marked as a failure.
  6. Deploys the built .jar/.whl to the Databricks location mentioned in databricks-job.json mentioned in the jobs directory of the project. If the Job already exists in Databricks it is updated with the new .jar/.whl. If this process fails at any step, a non-zero exit code is returned which stops the workflow from proceeding further and the workflow run is marked as a failure.