Skip to main content

Interactive Execution

Running a Pipeline

There are 2 ways to run a Pipeline interactively:

Interactive run options

  1. Using the play button from the bottom right side. This would execute the entire Pipeline.
  2. Using the play button on a particular Gem. This would only execute the flow in the Pipeline up to and including that Gem. This comes in handy during development, so that we don't have to run the entire Pipeline to debug/change a particular Gem.

Interims

During development, often the user will want to see their data to make more sense of it and to check whether the expected output is getting generated or not after the transformation. Prophecy generates these data samples as Interims, which are temporarily cached previews of data after each Gem.

Which Gems automatically get Interims created is controlled by the Pipeline settings as shown below.

Data and Job Sampling

From the Pipeline, select the (1) dropdown and (2) Pipeline Settings. Select (3) Job Sampling to generate interim samples for scheduled jobs. Select (4) Sampling Mode to chose the level of data sampling. Select (5) Data Sampling to generate interim samples during interactive runs, and select the Sampling Mode accordingly. These two options, Job sampling and Data sampling, are independent; one does not affect the other. For Job sampling, the interim metrics are stored in the compute cluster, such as the Databricks workspace, and visible in execution metrics.

Advanced Data sampling setting

There is also a global level Development Settings flag that admins can use to disable Data sampling for a given Fabric. This flag overrides the Pipeline level Data sampling settings. When disabled, you won't be able to see production data in the interims when you run the Pipeline.

From the Metadata page, click the Fabrics tab and select the Fabric that you want to change the Data sampling setting for. Click the Advanced tab and click the Allow for data sampling toggle to turn on or off the flag.

Create a new model test

Data sampling is enabled on by default. When left enabled, Data sampling uses the Pipeline's Data sampling settings. Prophecy samples data during the interactive run experience to provide the best debugging experience for users.

Data sampling modes

Toggle the images below to view the various modes (or levels) of data sampling. By default, for interactive runs, data sampling is enabled for all components. Note Vanilla is an interim sampling mode reserved for Shared Databricks clusters.

Data Sampling Mode - All

Execution

Once we run a Pipeline, we have several options to better understand our Pipeline:

Execution Code

Once we run a Pipeline interactively Prophecy generates the execution code in the backend, which is then executed in the selected Fabric.

Execution code

info

Execution code can also be copy-pasted inside databricks notebook or shell and can directly be executed for debugging.

Execution Errors

If there are any errors in the Pipeline, a pop-up window will open for execution errors. Interactive execution error

Also the error can be seen in the runtime logs: Interactive execution error logs

Runtime Logs

Overall progress with associated timestamps can be monitored from the Runtime Logs as shown here:

Runtime Logs

Runtime Metrics

Various Spark metrics collected during runtime can be monitored as shown here:

Runtime Metrics

Execution Metrics

For interactive runs execution metrics are collected to make the development easier and performance tuning more intuitive. These can be accessed from the Metadata Page inside the run tab of the Pipeline.

Execution Metrics

Shell

Prophecy IDE comes with an inbuilt interactive Spark shell that supports both Python and Scala. The shell is an easy way to quickly analyze data or test Spark commands Interactive execution

info

Spark context and session are available within the shell as variables sc and spark respectively


Examples

note

You need to be connected to a cluster to access the interactive shell

Python

Python interactive execution

Scala

Scala interactive execution