Prophecy Build Tool (PBT)

Prophecy-built-tool (PBT) allows you to quickly build, test and deploy projects generated by Prophecy (your standard Spark Scala and PySpark Pipelines) to integrate with your own CI / CD (e.g. Github Actions), build system (e.g. Jenkins), and orchestration (e.g. Databricks Workflows).

Features (v1.1.0)

Build Pipelines (all or specify ones to build) in Prophecy projects (Scala and Python)
Unit test Pipelines in Prophecy projects (Scala and Python)
Deploy Jobs with built Pipelines on Databricks
Deploying Jobs filtered with Fabric ids on Databricks
Integrate with CI/CD tools like GitHub Actions
Verify the project structure of Prophecy projects
Deploying Pipeline Configurations

Requirements

Python >=3.7 (Recommended 3.9.13)
pip
pyspark (Recommended 3.3.0)

Installation

To install PBT, simply run:

pip3 install prophecy-build-tool

Integration Examples

Github Actions

Jenkins

Quickstart

Usage

Usage: pbt [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  build
  deploy
  test

Running locally

The PBT cli can be used to build, test and deploy projects created by Prophecy that are present in your local filesystem.

Please make sure the DATABRICKS_URL and DATABRICKS_TOKEN environment variables are set appropriately pointing to your Databricks workspace before running any PBT commands. Example:

export DATABRICKS_HOST="https://example_databricks_host.cloud.databricks.com"
export DATABRICKS_TOKEN="exampledatabrickstoken"

Building Pipelines and deploying Jobs

PBT can build and deploy Jobs inside your Prophecy project to the Databricks environment defined by the DATABRICKS_HOST and DATABRICKS_TOKEN environment variables.

Since v1.0.3 PBT supports new input parameters that are used to determine the DBFS path where your project's artifacts would be uploaded. These are the --release-version and --project-id parameters which would be used to replace the __PROJECT_RELEASE_VERSION_ PLACEHOLDER__ and __PROJECT_ID_PLACEHOLDER__ placeholders that would already be present in your Job's definition file (databricks-job.json). Using a unique release version of your choice and the project's Prophecy ID (as seen in the project's URL on the Prophecy UI) is recommended.

Build command

pbt build --path /path/to/your/prophecy_project/

PBT provides user the ability to filter pipelines to be build, this can be huge time saving if we have large number of pipelines,
Additionally, multiple pipelines can be passed comma(,) separated. To only build certain pipelines we can use:

pbt build --pipelines customers_orders,join_agg_sort  --path /path/to/your/prophecy_project/

PBT builds by default fails(EXIT 1) if any of the Pipeline builds failed either due to corrupt Pipeline or build failure.
Although if we want to continue, we can skip these errors by using --ignore-build-errors and --ignore-parse-errors flags
--ignore-build-errors flag skips package build failures
--ignore-parse-errors flag skips project parsing error failures

pbt build --path /path/to/your/prophecy_project/ --ignore-build-errors --ignore-parse-errors

Deploy command

pbt deploy --path /path/to/your/prophecy_project/ --release-version 1.0 --project-id 10

Sample output:

Prophecy-build-tool v1.0.4.1

Found 1 jobs: daily
Found 1 pipelines: customers_orders (python)

Building 1 pipelines 🚰

  Building pipeline pipelines/customers_orders [1/1]

✅ Build complete!

Deploying 1 jobs ⏱

  Deploying job jobs/daily [1/1]
    Uploading customers_orders-1.0-py3-none-any.whl to
dbfs:/FileStore/prophecy/artifacts/...
Querying existing jobs to find current job: Offset: 0, Pagesize: 25
    Updating an existing job: daily

✅ Deployment completed successfully!

The deploy command also supports an advanced option --dependent-projects-path if there is a need to build projects other than the main project that has to be deployed. This would be useful if there are dependent Pipelines whose source code can be cloned into a different directory accessible to PBT while running deploy for the main project. This option supports only one path as argument but the path itself can contain multiple Prophecy projects within it in different subdirectories.

Example deploy command:

pbt deploy --path /path/to/your/prophecy_project/ --release-version 1.0 --project-id 10 --dependent-projects-path /path/to/dependent/prophecy/projects

The deploy command also supports an advanced option --fabric-ids ( comma separated if more than one ) if there is a need to only deploy Jobs associated with certain Fabric IDs. This option is often used in a multi-workspace environment. Find the Fabric ID for your Fabric by navigating to the Metadata page of that Fabric and observing the URL.

The following command will filter out and only deploy the jobs associated with given Fabric ids. Example deploy:

pbt deploy --fabric-ids 647,1527 --path /path/to/your/prophecy_project/

Sample output:

Project name: HelloWorld
Found 2 jobs: ashish-TestJob2, ashish-TestJob
Found 4 pipelines: customers_orders (python), report_top_customers (python), join_agg_sort (python),
farmers-markets-irs (python)
[SKIP]: Skipping builds for all pipelines as '--skip-builds' flag is passed.

 Deploying 2 jobs
Deploying jobs only for given Fabric IDs: ['647', '1527']

[START]:  Deploying job jobs/TestJob2 [1/2]
[DEPLOY]: Job being deployed for fabric id: 1527
    Pipeline pipelines/farmers-markets-irs might be shared, checking if it exists in DBFS
    Dependent package exists on DBFS already, continuing with next pipeline
    Pipeline pipelines/report_top_customers might be shared, checking if it exists in DBFS
    Dependent package exists on DBFS already, continuing with next pipeline
    Querying existing jobs to find current job: Offset: 0, Pagesize: 25
    Updating an existing job: ashish-TestJob2

[START]:  Deploying job jobs/TestJob [2/2]
[DEPLOY]: Job being deployed for fabric id: 647
    Pipeline pipelines/customers_orders might be shared, checking if it exists in DBFS
    Dependent package exists on DBFS already, continuing with next pipeline
    Pipeline pipelines/join_agg_sort might be shared, checking if it exists in DBFS
    Dependent package exists on DBFS already, continuing with next pipeline
    Pipeline pipelines/report_top_customers might be shared, checking if it exists in DBFS
    Dependent package exists on DBFS already, continuing with next pipeline
    Querying existing jobs to find current job: Offset: 0, Pagesize: 25
    Updating an existing job: ashish-TestJob

✅ Deployment completed successfully!

By default, deploy command builds all pipelines and then deploys them, if you want to skip building all pipelines ( this could be useful, if you are running a deploy command after running deploy or build previously.)

pbt deploy --skip-builds --path /path/to/your/prophecy_project/

Deploy specific Jobs using JobId filter

By default, deploy command builds all pipelines and then deploys all jobs, if you want to deploy some specific jobs we can use job-ids filter (we can find JobId on Job metadata page) , PBT will automatically calculate all the pipelines needed for the jobs and then build them. This could be really useful, if we have many jobs and we only want to deploy only few.

pbt deploy --path /path/to/your/prophecy_project/ --job-ids "TestJob1"

we can also pass multiple comma separated Job Ids

pbt deploy --path /path/to/your/prophecy_project/ --job-ids "TestJob1,TestJob2"

Complete list of options for PBT deploy:

pbt deploy --help
Prophecy-build-tool v1.0.4.1

Usage: pbt deploy [OPTIONS]

Options:
  --path TEXT                     Path to the directory containing the
                                  pbt_project.yml file  [required]
  --dependent-projects-path TEXT  Dependent projects path
  --release-version TEXT          Release version to be used during
                                  deployments
  --project-id TEXT               Project Id placeholder to be used during
                                  deployments
  --prophecy-url TEXT             Prophecy URL placeholder to be used during
                                  deployments
  --fabric-ids TEXT               Fabric IDs(comma separated) which can be
                                  used to filter jobs for deployments
  --skip-builds                   Flag to skip building Pipelines
  --help                          Show this message and exit.

Running all unit tests in project

PBT supports running unit tests inside the Prophecy project. Unit tests run with the default configuration present in the Pipeline's configs/resources/config directory.

To run all unit tests present in the project, use the test command as follows:

pbt test --path /path/to/your/prophecy_project/

Sample output:

Prophecy-build-tool v1.0.1

Found 1 jobs: daily
Found 1 pipelines: customers_orders (python)

  Unit Testing pipeline pipelines/customers_orders [1/1]

    ============================= test session starts ==============================
    platform darwin -- Python 3.8.9, pytest-7.1.2, pluggy-1.0.0 -- /Library/Developer/CommandLineTools/usr/bin/python
    cachedir: .pytest_cache
    metadata: None
    rootdir: /path/to/your/prophecy_project/pipelines/customers_orders/code
    plugins: html-3.1.1, metadata-2.0.2
    collecting ... collected 1 item

    test/TestSuite.py::CleanupTest::test_unit_test_0 PASSED                  [100%]

    ============================== 1 passed in 17.42s ==============================

✅ Unit test for pipeline: pipelines/customers_orders succeeded.

Users can also pass --driver-library-path as a parameter to pbt test command to pass jars of Prophecy-libs dependencies to the command. If user doesn't add it, the tool by default picks the libraries from maven central.

pbt test --path /path/to/your/prophecy_project/ --driver-library-path <path_to_the_jars>

Validating project

PBT supports validating all pipelines inside the Prophecy project. This allows users to check pipelines before deploying. Validation involves checking if the pipelines have any diagnostics. These are the same diagnostics which are shown on our Visual IDE.

To run validate all pipelines present in the project, use the validate command as follows:

pbt validate --path /path/to/your/prophecy_project/

Sample output:

Prophecy-build-tool v1.0.3.4

Project name: HelloWorld
Found 1 jobs: default_schedule
Found 4 pipelines: customers_orders (python), report_top_customers (python), join_agg_sort (python), farmers-markets-irs (python)

Validating 4 pipelines

  Validating pipeline pipelines/customers_orders [1/4]

 Pipeline is validated: customers_orders

  Validating pipeline pipelines/report_top_customers [2/4]

 Pipeline is validated: report_top_customers

  Validating pipeline pipelines/join_agg_sort [3/4]

 Pipeline is validated: join_agg_sort

  Validating pipeline pipelines/farmers-markets-irs [4/4]

 Pipeline is validated: farmers-markets-irs

What's next

To continue using PBT, see the following pages:

📄️ PBT on Github Actions

Example usage of Prophecy Build Tool on Github Actions

📄️ PBT on Jenkins

Example Usage of Prophecy Build Tool on Jenkins

Features (v1.1.0)​

Requirements​

Installation​

Integration Examples​

Github Actions​

Jenkins​

Quickstart​

Usage​

Running locally​

Building Pipelines and deploying Jobs​

Build command​

Deploy command​

Deploy specific Jobs using JobId filter​

Running all unit tests in project​

Validating project​

What's next​

📄️ PBT on Github Actions

📄️ PBT on Jenkins

Features (v1.1.0)

Requirements

Installation

Integration Examples

Github Actions

Jenkins

Quickstart

Usage

Running locally

Building Pipelines and deploying Jobs

Build command

Deploy command

Deploy specific Jobs using JobId filter

Running all unit tests in project

Validating project

What's next