Skip to main content

Execution

Prophecy executes pipelines based on the type of Spark cluster and the operation being performed. You can run pipelines interactively, schedule them, or execute them using the built-in Spark shell. Each execution provides information through logs and metrics to help you manage and monitor data transformations.

Interactive execution

When you run a pipeline in the pipeline canvas, Prophecy generates interim data samples that let you preview the output of your data transformations. There are two ways to run a pipeline interactively:

  • Run the entire pipeline using the play button on the pipeline canvas.
  • Execute the pipeline up to and including a particular gem using the play button on that gem. This is also know as a partial run.

Interactive run options

After you run your pipeline, data samples will appear between gems. These previews are temporarily cached. Learn about how these data samples are generated or discover the Data Explorer.

Scheduled execution

When you create jobs in Prophecy, you schedule when certain pipelines will run. Prophecy executes pipelines based on the fabric defined in the pipeline's job settings. Prophecy will automatically run jobs once relevant projects are released and deployed.

Shell execution

Prophecy comes with an built-in interactive Spark shell that supports both Python and Scala. The shell is an easy way to quickly analyze data or test Spark commands. The Spark context and session are available within the shell as variables sc and spark respectively.

Interactive execution

Execution information

Once you run a pipeline, there are several ways for you to better understand the execution.

CalloutInformationDescription
1ProblemsErrors from your pipeline execution that will be shown in a dialog window, as well as in the canvas footer.
2Runtime logsThe progress with timestamps of your pipeline runs and any errors.
3Execution codeThe code Prophecy runs to execute your pipeline. You can copy and paste this code elsewhere for debugging.
4Runtime metricsVarious Spark metrics collected during runtime.
5Execution metricsMetrics that can be found in the Metadata of a pipeline, or from the Run History button under the ... menu.

Use the image below to help you find the relevant information.

Execution information

Execution on Databricks

Databricks clusters come with various Access Modes. To use Unity Catalog Shared clusters, check for feature support here.

info

When using High Concurrency or Shared Mode Databricks Clusters you may notice a delay when running the first command, or when your cluster is scaling up to meet demand. This delay is due to Prophecy and pipeline dependencies (Maven or Python packages) being installed. For the best performance, it is recommended that you cache packages in an Artifactory or on DBFS. Please contact us to learn more about this.