Prophecy Build Tool (PBT)
Prophecy-built-tool (PBT) allows you to quickly build, test and deploy projects generated by Prophecy (your standard Spark Scala and PySpark Pipelines) to integrate with your own CI / CD (e.g. Github Actions), build system (e.g. Jenkins), and orchestration (e.g. Databricks Workflows).
Features (v1.1.0)
- Build Pipelines (all or specify ones to build) in Prophecy projects (Scala and Python)
- Unit test Pipelines in Prophecy projects (Scala and Python)
- Deploy Jobs with built Pipelines on Databricks
- Deploying Jobs filtered with Fabric ids on Databricks
- Integrate with CI/CD tools like GitHub Actions
- Verify the project structure of Prophecy projects
- Deploying Pipeline Configurations
Requirements
- Python >=3.7 (Recommended 3.9.13)
- pip
pyspark
(Recommended 3.3.0)
Installation
To install PBT, simply run:
pip3 install prophecy-build-tool
Integration Examples
Github Actions
Jenkins
Quickstart
Usage
Usage: pbt [OPTIONS] COMMAND [ARGS]...
Options:
--help Show this message and exit.
Commands:
build
deploy
test
Running locally
The PBT cli can be used to build, test and deploy projects created by Prophecy that are present in your local filesystem.
Please make sure the DATABRICKS_URL and DATABRICKS_TOKEN environment variables are set appropriately pointing to your Databricks workspace before running any PBT commands. Example:
export DATABRICKS_HOST="https://example_databricks_host.cloud.databricks.com"
export DATABRICKS_TOKEN="exampledatabrickstoken"
Building Pipelines and deploying Jobs
PBT can build and deploy Jobs inside your Prophecy project to the Databricks environment defined by the DATABRICKS_HOST
and DATABRICKS_TOKEN
environment variables.
Since v1.0.3 PBT supports new input parameters that are used to determine the DBFS path where your project's artifacts would
be uploaded. These are the --release-version
and --project-id
parameters which would be used to replace the __PROJECT_RELEASE_VERSION_ PLACEHOLDER__
and __PROJECT_ID_PLACEHOLDER__
placeholders that would already be present in your Job's definition file
(databricks-job.json
). Using a unique release version of your choice and the project's Prophecy ID
(as seen in the project's URL on the Prophecy UI) is recommended.
Build command
pbt build --path /path/to/your/prophecy_project/
- PBT provides user the ability to filter pipelines to be build, this can be huge time saving if we have large number of pipelines,
- Additionally, multiple pipelines can be passed comma(,) separated. To only build certain pipelines we can use:
pbt build --pipelines customers_orders,join_agg_sort --path /path/to/your/prophecy_project/
- PBT builds by default fails(EXIT 1) if any of the Pipeline builds failed either due to corrupt Pipeline or build failure.
- Although if we want to continue, we can skip these errors by using
--ignore-build-errors
and--ignore-parse-errors
flags --ignore-build-errors
flag skips package build failures--ignore-parse-errors
flag skips project parsing error failures
pbt build --path /path/to/your/prophecy_project/ --ignore-build-errors --ignore-parse-errors
Deploy command
pbt deploy --path /path/to/your/prophecy_project/ --release-version 1.0 --project-id 10
Sample output:
Prophecy-build-tool v1.0.4.1
Found 1 jobs: daily
Found 1 pipelines: customers_orders (python)
Building 1 pipelines 🚰
Building pipeline pipelines/customers_orders [1/1]
✅ Build complete!
Deploying 1 jobs ⏱
Deploying job jobs/daily [1/1]
Uploading customers_orders-1.0-py3-none-any.whl to
dbfs:/FileStore/prophecy/artifacts/...
Querying existing jobs to find current job: Offset: 0, Pagesize: 25
Updating an existing job: daily
✅ Deployment completed successfully!
The deploy
command also supports an advanced option --dependent-projects-path
if there is a need to build projects other than the main project that has to be deployed.
This would be useful if there are dependent Pipelines whose source code can be cloned into a different directory accessible to PBT
while running deploy
for the main project. This option supports only one path as argument but the path itself can contain multiple Prophecy projects within it in different
subdirectories.
Example deploy command:
pbt deploy --path /path/to/your/prophecy_project/ --release-version 1.0 --project-id 10 --dependent-projects-path /path/to/dependent/prophecy/projects
The deploy
command also supports an advanced option --fabric-ids
( comma separated if more than one ) if there is a
need to only deploy Jobs associated with certain Fabric IDs. This option is often used in a multi-workspace environment.
Find the Fabric ID for your Fabric by navigating to the Metadata page of that Fabric and observing the URL.
The following command will filter out and only deploy the jobs associated with given Fabric ids. Example deploy:
pbt deploy --fabric-ids 647,1527 --path /path/to/your/prophecy_project/
Sample output:
Project name: HelloWorld
Found 2 jobs: ashish-TestJob2, ashish-TestJob
Found 4 pipelines: customers_orders (python), report_top_customers (python), join_agg_sort (python),
farmers-markets-irs (python)
[SKIP]: Skipping builds for all pipelines as '--skip-builds' flag is passed.
Deploying 2 jobs
Deploying jobs only for given Fabric IDs: ['647', '1527']
[START]: Deploying job jobs/TestJob2 [1/2]
[DEPLOY]: Job being deployed for fabric id: 1527
Pipeline pipelines/farmers-markets-irs might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Pipeline pipelines/report_top_customers might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Querying existing jobs to find current job: Offset: 0, Pagesize: 25
Updating an existing job: ashish-TestJob2
[START]: Deploying job jobs/TestJob [2/2]
[DEPLOY]: Job being deployed for fabric id: 647
Pipeline pipelines/customers_orders might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Pipeline pipelines/join_agg_sort might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Pipeline pipelines/report_top_customers might be shared, checking if it exists in DBFS
Dependent package exists on DBFS already, continuing with next pipeline
Querying existing jobs to find current job: Offset: 0, Pagesize: 25
Updating an existing job: ashish-TestJob
✅ Deployment completed successfully!
By default, deploy
command builds all pipelines and then deploys them, if you want to skip building all pipelines
( this could be useful, if you are running a deploy
command after running deploy
or build
previously.)
pbt deploy --skip-builds --path /path/to/your/prophecy_project/
Deploy specific Jobs using JobId filter
By default, deploy
command builds all pipelines and then deploys all jobs, if you want to deploy some specific jobs
we can use job-ids
filter (we can find JobId on Job metadata page) , PBT will automatically calculate all the pipelines needed for the jobs and then build them.
This could be really useful, if we have many jobs and we only want to deploy only few.
pbt deploy --path /path/to/your/prophecy_project/ --job-ids "TestJob1"
- we can also pass multiple comma separated Job Ids
pbt deploy --path /path/to/your/prophecy_project/ --job-ids "TestJob1,TestJob2"
Complete list of options for PBT deploy
:
pbt deploy --help
Prophecy-build-tool v1.0.4.1
Usage: pbt deploy [OPTIONS]
Options:
--path TEXT Path to the directory containing the
pbt_project.yml file [required]
--dependent-projects-path TEXT Dependent projects path
--release-version TEXT Release version to be used during
deployments
--project-id TEXT Project Id placeholder to be used during
deployments
--prophecy-url TEXT Prophecy URL placeholder to be used during
deployments
--fabric-ids TEXT Fabric IDs(comma separated) which can be
used to filter jobs for deployments
--skip-builds Flag to skip building Pipelines
--help Show this message and exit.
Running all unit tests in project
PBT supports running unit tests inside the Prophecy project. Unit tests run with the default
configuration present in the
Pipeline's configs/resources/config
directory.
To run all unit tests present in the project, use the test
command as follows:
pbt test --path /path/to/your/prophecy_project/
Sample output:
Prophecy-build-tool v1.0.1
Found 1 jobs: daily
Found 1 pipelines: customers_orders (python)
Unit Testing pipeline pipelines/customers_orders [1/1]
============================= test session starts ==============================
platform darwin -- Python 3.8.9, pytest-7.1.2, pluggy-1.0.0 -- /Library/Developer/CommandLineTools/usr/bin/python
cachedir: .pytest_cache
metadata: None
rootdir: /path/to/your/prophecy_project/pipelines/customers_orders/code
plugins: html-3.1.1, metadata-2.0.2
collecting ... collected 1 item
test/TestSuite.py::CleanupTest::test_unit_test_0 PASSED [100%]
============================== 1 passed in 17.42s ==============================
✅ Unit test for pipeline: pipelines/customers_orders succeeded.
Users can also pass --driver-library-path as a parameter to pbt test command to pass jars of Prophecy-libs dependencies to the command. If user doesn't add it, the tool by default picks the libraries from maven central.
pbt test --path /path/to/your/prophecy_project/ --driver-library-path <path_to_the_jars>
Validating project
PBT supports validating all pipelines inside the Prophecy project. This allows users to check pipelines before deploying. Validation involves checking if the pipelines have any diagnostics. These are the same diagnostics which are shown on our Visual IDE.
To run validate all pipelines present in the project, use the validate
command as follows:
pbt validate --path /path/to/your/prophecy_project/
Sample output:
Prophecy-build-tool v1.0.3.4
Project name: HelloWorld
Found 1 jobs: default_schedule
Found 4 pipelines: customers_orders (python), report_top_customers (python), join_agg_sort (python), farmers-markets-irs (python)
Validating 4 pipelines
Validating pipeline pipelines/customers_orders [1/4]
Pipeline is validated: customers_orders
Validating pipeline pipelines/report_top_customers [2/4]
Pipeline is validated: report_top_customers
Validating pipeline pipelines/join_agg_sort [3/4]
Pipeline is validated: join_agg_sort
Validating pipeline pipelines/farmers-markets-irs [4/4]
Pipeline is validated: farmers-markets-irs
What's next
To continue using PBT, see the following pages:
📄️ PBT on Github Actions
Example usage of Prophecy Build Tool on Github Actions
📄️ PBT on Jenkins
Example Usage of Prophecy Build Tool on Jenkins