Skip to main content

Prophecy Build Tool (PBT)

Prophecy-built-tool (PBT) allows you to quickly build, test and deploy projects generated by Prophecy (your standard Spark Scala and PySpark Pipelines) to integrate with your own CI / CD (e.g. Github Actions), build system (e.g. Jenkins), and orchestration (e.g. Databricks Workflows).

Features (v1.1.0)

  • Build Pipelines (all or specify ones to build) in Prophecy projects (Scala and Python)
  • Unit test Pipelines in Prophecy projects (Scala and Python)
  • Deploy Jobs with built Pipelines on Databricks
  • Deploying Jobs filtered with Fabric ids on Databricks
  • Integrate with CI/CD tools like GitHub Actions
  • Verify the project structure of Prophecy projects
  • Deploying Pipeline Configurations

Requirements

  • Python >=3.7 (Recommended 3.9.13)
  • pip
  • pyspark (Recommended 3.3.0)

Installation

To install PBT, simply run:

pip3 install prophecy-build-tool

Integration Examples

Github Actions

Jenkins

Quickstart

Usage

Usage: pbt [OPTIONS] COMMAND [ARGS]...

Options:
--help Show this message and exit.

Commands:
build
deploy
test

Running locally

The PBT cli can be used to build, test and deploy projects created by Prophecy that are present in your local filesystem.

Please make sure the DATABRICKS_URL and DATABRICKS_TOKEN environment variables are set appropriately pointing to your Databricks workspace before running any PBT commands. Example:

export DATABRICKS_HOST="https://example_databricks_host.cloud.databricks.com"
export DATABRICKS_TOKEN="exampledatabrickstoken"

Building Pipelines and deploying Jobs

PBT can build and deploy Jobs inside your Prophecy project to the Databricks environment defined by the DATABRICKS_HOST and DATABRICKS_TOKEN environment variables.

Since v1.0.3 PBT supports new input parameters that are used to determine the DBFS path where your project's artifacts would be uploaded. These are the --release-version and --project-id parameters which would be used to replace the __PROJECT_RELEASE_VERSION_ PLACEHOLDER__ and __PROJECT_ID_PLACEHOLDER__ placeholders that would already be present in your Job's definition file (databricks-job.json). Using a unique release version of your choice and the project's Prophecy ID (as seen in the project's URL on the Prophecy UI) is recommended.

Build command
pbt build --path /path/to/your/prophecy_project/
  • PBT provides user the ability to filter pipelines to be build, this can be huge time saving if we have large number of pipelines,
  • Additionally, multiple pipelines can be passed comma(,) separated. To only build certain pipelines we can use:
pbt build --pipelines customers_orders,join_agg_sort  --path /path/to/your/prophecy_project/
  • PBT builds by default fails(EXIT 1) if any of the Pipeline builds failed either due to corrupt Pipeline or build failure.
  • Although if we want to continue, we can skip these errors by using --ignore-build-errors and --ignore-parse-errors flags
  • --ignore-build-errors flag skips package build failures
  • --ignore-parse-errors flag skips project parsing error failures
pbt build --path /path/to/your/prophecy_project/ --ignore-build-errors --ignore-parse-errors
Deploy command
pbt deploy --path /path/to/your/prophecy_project/ --release-version 1.0 --project-id 10

Sample output:

Prophecy-build-tool v1.0.4.1

Found 1 jobs: daily
Found 1 pipelines: customers_orders (python)

Building 1 pipelines 🚰

Building pipeline pipelines/customers_orders [1/1]

✅ Build complete!

Deploying 1 jobs

Deploying job jobs/daily [1/1]
Uploading customers_orders-1.0-py3-none-any.whl to
dbfs:/FileStore/prophecy/artifacts/...
Querying existing jobs to find current job: Offset: 0, Pagesize: 25
Updating an existing job: daily

✅ Deployment completed successfully!

The deploy command also supports an advanced option --dependent-projects-path if there is a need to build projects other than the main project that has to be deployed. This would be useful if there are dependent Pipelines whose source code can be cloned into a different directory accessible to PBT while running deploy for the main project. This option supports only one path as argument but the path itself can contain multiple Prophecy projects within it in different subdirectories.

Example deploy command:

pbt deploy --path /path/to/your/prophecy_project/ --release-version 1.0 --project-id 10 --dependent-projects-path /path/to/dependent/prophecy/projects

The deploy command also supports an advanced option --fabric-ids ( comma separated if more than one ) if there is a need to only deploy Jobs associated with certain Fabric IDs. This option is often used in a multi-workspace environment. Find the Fabric ID for your Fabric by navigating to the Metadata page of that Fabric and observing the URL.

The following command will filter out and only deploy the jobs associated with given Fabric ids. Example deploy:

pbt deploy --fabric-ids 647,1527 --path /path/to/your/prophecy_project/

Sample output:

Project name: HelloWorld
Found 2 jobs: ashish-TestJob2, ashish-TestJob
Found 4 pipelines: customers_orders (python), report_top_customers (python), join_agg_sort (python),
farmers-markets-irs (python)
[SKIP]: Skipping builds for all pipelines as '--skip-builds' flag is passed.

Deploying 2 jobs
Deploying jobs only for given Fabric IDs: ['647', '1527']

[START]: Deploying job jobs/TestJob2 [1/2]
[DEPLOY]: Job being deployed for fabric id: 1527
Pipeline pipelines/farmers-markets-irs might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Pipeline pipelines/report_top_customers might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Querying existing jobs to find current job: Offset: 0, Pagesize: 25
Updating an existing job: ashish-TestJob2

[START]: Deploying job jobs/TestJob [2/2]
[DEPLOY]: Job being deployed for fabric id: 647
Pipeline pipelines/customers_orders might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Pipeline pipelines/join_agg_sort might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Pipeline pipelines/report_top_customers might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Querying existing jobs to find current job: Offset: 0, Pagesize: 25
Updating an existing job: ashish-TestJob

✅ Deployment completed successfully!

By default, deploy command builds all pipelines and then deploys them, if you want to skip building all pipelines ( this could be useful, if you are running a deploy command after running deploy or build previously.)

pbt deploy --skip-builds --path /path/to/your/prophecy_project/
Deploy specific Jobs using JobId filter

By default, deploy command builds all pipelines and then deploys all jobs, if you want to deploy some specific jobs we can use job-ids filter (we can find JobId on Job metadata page) , PBT will automatically calculate all the pipelines needed for the jobs and then build them. This could be really useful, if we have many jobs and we only want to deploy only few.

pbt deploy --path /path/to/your/prophecy_project/ --job-ids "TestJob1"
  • we can also pass multiple comma separated Job Ids
pbt deploy --path /path/to/your/prophecy_project/ --job-ids "TestJob1,TestJob2"

Complete list of options for PBT deploy:

pbt deploy --help
Prophecy-build-tool v1.0.4.1

Usage: pbt deploy [OPTIONS]

Options:
--path TEXT Path to the directory containing the
pbt_project.yml file [required]
--dependent-projects-path TEXT Dependent projects path
--release-version TEXT Release version to be used during
deployments
--project-id TEXT Project Id placeholder to be used during
deployments
--prophecy-url TEXT Prophecy URL placeholder to be used during
deployments
--fabric-ids TEXT Fabric IDs(comma separated) which can be
used to filter jobs for deployments
--skip-builds Flag to skip building Pipelines
--help Show this message and exit.

Running all unit tests in project

PBT supports running unit tests inside the Prophecy project. Unit tests run with the default configuration present in the Pipeline's configs/resources/config directory.

To run all unit tests present in the project, use the test command as follows:

pbt test --path /path/to/your/prophecy_project/

Sample output:

Prophecy-build-tool v1.0.1

Found 1 jobs: daily
Found 1 pipelines: customers_orders (python)

Unit Testing pipeline pipelines/customers_orders [1/1]

============================= test session starts ==============================
platform darwin -- Python 3.8.9, pytest-7.1.2, pluggy-1.0.0 -- /Library/Developer/CommandLineTools/usr/bin/python
cachedir: .pytest_cache
metadata: None
rootdir: /path/to/your/prophecy_project/pipelines/customers_orders/code
plugins: html-3.1.1, metadata-2.0.2
collecting ... collected 1 item

test/TestSuite.py::CleanupTest::test_unit_test_0 PASSED [100%]

============================== 1 passed in 17.42s ==============================

✅ Unit test for pipeline: pipelines/customers_orders succeeded.

Users can also pass --driver-library-path as a parameter to pbt test command to pass jars of Prophecy-libs dependencies to the command. If user doesn't add it, the tool by default picks the libraries from maven central.

pbt test --path /path/to/your/prophecy_project/ --driver-library-path <path_to_the_jars>

Validating project

PBT supports validating all pipelines inside the Prophecy project. This allows users to check pipelines before deploying. Validation involves checking if the pipelines have any diagnostics. These are the same diagnostics which are shown on our Visual IDE.

To run validate all pipelines present in the project, use the validate command as follows:

pbt validate --path /path/to/your/prophecy_project/

Sample output:

Prophecy-build-tool v1.0.3.4

Project name: HelloWorld
Found 1 jobs: default_schedule
Found 4 pipelines: customers_orders (python), report_top_customers (python), join_agg_sort (python), farmers-markets-irs (python)

Validating 4 pipelines

Validating pipeline pipelines/customers_orders [1/4]

Pipeline is validated: customers_orders

Validating pipeline pipelines/report_top_customers [2/4]

Pipeline is validated: report_top_customers

Validating pipeline pipelines/join_agg_sort [3/4]

Pipeline is validated: join_agg_sort

Validating pipeline pipelines/farmers-markets-irs [4/4]

Pipeline is validated: farmers-markets-irs