Skip to main content

Configuration

Allows you to define configurations to control various aspects of your Pipeline.

Config Option

Prophecy IDE allows you to define three kinds of configurations:

Spark Configuration

Set runtime Spark configurations as name-value pairs. The name-value pairs will be set inside the Spark runtime configurations as spark.conf.set(name, value)

Configurations - Spark

Hadoop Configuration

Hadoop configurations as name-value pairs. The name-value pairs will be set inside the Hadoop configuration as spark.sparkContext.hadoopConfiguration.set(name, value)

Configurations - Spark

Pipeline Configuration

Config values which can be set at Pipeline level and then be accessed inside any component in the Pipeline. Multiple instances of configuration can be created per Pipeline.

Configurations - Common

Syntax for using configuration inside Gems

We support language-specific config syntaxes for different data types and coding languages for configurations inside of your Spark Gems. We also support a common Jinja config syntax. You can use either syntax or a combination of them.

Language-specific Config syntax

See the following language-specific config syntax for SQL, Scala, and Python in Spark.

Visual Language Selection

  • For visual language SQL: '$config_name'

  • For visual language Scala/Python: Config.config_name

  • For using Spark expression with visual language SQL: expr('$config_name')

  • For using Spark expression with visual language Scala/Python: expr(Config.config_name)

Jinja Config syntax

You can choose to use a common Jinja config syntax for configurations inside of your Spark Gems.

You must enable it by navigating to ... > Pipeline Settings.

Pipeline Settings

Under the Code section, click to enable Jinja config syntax. This setting is toggled on by default for new Pipelines.

Enable Jinja Config syntax

See the following syntax for Jinja in Spark.

  • For Jinja: {{config_name}}

  • For using Spark expression with Jinja: expr('{{config_name}}')

So for example, instead of using '$read_env' for SQL or Config.read_env for Scala/Python, you can use {{read_env}}.

Jinja config syntax supports the following functionalities:

  • Integer/String data type - You can use a data type as it is, such as an integer field as an integer by using {{ integer_field }}. For strings, use <some_character>{{ integer_field }}<some_character>.

  • Concatenation - You can define multiple Jinja syntaxes in the same string, and the generated code will be a formatted string. For example, '{{c_string}}___{{ c_int }}'

  • SQL Statement queries - You can use Jinja syntax in SQL Statement Gem queries. For example, select {{col1}} from {{table}} where {{condition}}

  • Nested call_func - You can use Jinja syntax inside of call_func or call_function, including those that are nested within other functions.

    LEAST((LEAST(product.SAS, COALESCE(product.SAS, call_func('data_{{read_env}}.business_rules.datum', DTM, '123456789')))) + 1, cast('9999-12-31' as DATE))

Examples for Pipeline level configurations

Now let's use the above defined configurations in the below Pipeline. Pipeline view

Using Config in limit Gem

SQL Visual Language

In the below image '$num_top_customers' is fetching the integer value defined in configurations.

Config Limit Example

Scala/Python Visual Language

In the below image Config.num_top_customers is fetching the integer value defined in configurations.

Config Limit Example

Jinja Config

In the below image {{num_top_customers}} is fetching the integer value defined in configurations.

Config Limit Example

Also see the following syntax examples for specific Gem property field data types:

  • SColumnExpression: lit("{{a.b.c}}")
  • SString: {{ a.b.c }}
  • String: {{ a.b.c }}

You can use the following syntax examples for accessing elements of array and record fields:

  • For array: {{ config1.array_config[23] }}
  • For record: {{ record1.record2.field1 }}

Using Spark-expression Config type in Gem

Here we have used Spark expression directly from our config value to populate a column.

SQL Visual Language

In the below image:

(1) amounts -> expr('$test_expression') is coming from configuration type defined as Spark-expression
(2) report_name -> '$report_name' is coming configuration type defined as string

Config Reformat example

Scala/Python Visual Language

In the below image:

(1) amounts -> expr(Config.test_expression) is coming from configuration type defined as Spark-expression
(2) report_name -> Config.report_name is coming configuration type defined as string

Config Reformat example

note

Similarly configurations defined as type Spark-expression can be used directly in filter, join, reformat etc Gems directly.

Jinja Config

In the below image:

(1) amounts -> expr('{{test_expression}}') is coming from configuration type defined as Spark-expression
(2) report_name -> {{report_name}} is coming configuration type defined as string

Config Reformat example

Using config in paths for Source/Target Gems

Config can also be used to refer to paths. This type of configuration comes in handy in situation where you have DEV, QA, and PROD data, and you want to configure Dataset (or in general the Job runs) based on which environment you are running it in.

Config path example

When using Jinja config for the previous example, you would use dbfs:/Prophecy/{{path_helper}}/CustomersOrders.csv.

Edit Pipeline Name

To change the Pipeline name itself, go to Prophecy's metadata page. Locate the Pipeline within a Project, and click the pencil icon.

Pipeline Configuration instances

Different configuration instances can be defined as per requirement. This comes in handy when Pipeline needs to run with different configurations in different environments or different users.

New instances can be configured to override default values as shown in image below:

Create config instance

Create pipeline override

Using a particular configuration instance for interactive runs

For interactive runs, configuration can be selected as shown in image below. Config interactive run

Using configuration instances in Jobs

Particular instances can also be configured in Databricks Jobs.

Config inside job

Overriding configuration values in Jobs

Specific values from configuration instance can be overridden as shown in images below:

Config job override

Code

All configuration instances and values are automatically converted to code as well. Default configurations are stored as code and specific instance overrides are stored as JSON files as shown in image below.

Scala Config code

Config scala code

Python Config code

Config python code

Component code

def Reformat(spark: SparkSession, in0: DataFrame) -> DataFrame:
return in0.select(
col("customer_id"),
col("orders"),
col("account_length_days"),
expr(Config.test_expression).as("amounts"),
lit(Config.report_name).as("report_name")
)