Configurations
A configuration is a set of predefined variables and values that control how a data pipeline behaves during execution. By using configurations, you can dynamically adapt a pipeline to different environments (e.g., development, testing, production) without modifying the pipeline itself.
Pipeline configurations
For each pipeline in the project editor, you'll see a Config option in the pipeline header. When you open it, you'll see two tabs: Schema and Config.
Schema tab
The Schema tab is where you declare your variables. These variables will be accessible to any component in the respective pipeline.
Parameter | Description |
---|---|
Name | The name of the variable. |
Type | The data type of the variable. |
Optional | A checkbox to define if the variable is optional. When the checkbox is not selected, you must set a default value for the variable. |
Description | An optional field where you can describe your variable. |
Config tab
The Config tab lets you set default values for your variables. You can create multiple configurations with different default values, which is useful when running your pipeline in different environments (like production and development).
Syntax
When you want to call configuration variables in your pipeline, you can reference them using Jinja syntax. Jinja variable syntax looks like: {{config_name}}
.
You can use the following syntax examples for accessing elements of array and record fields:
- For an array:
{{ config1.array_config[23] }}
- For a record:
{{ record1.record2.field1 }}
Jinja is enabled by default in new pipelines. To disable this setting, open the Pipeline Settings and turn off the Enable jinja based configuration toggle.
Depending on the Visual Language configured in your Pipeline Settings, you can also use that language's syntax to call variables.
Visual Language | Syntax | Expression usage |
---|---|---|
SQL | '$config_name' | expr('$config_name') |
Scala | Config.config_name | expr(Config.config_name) |
Python | Config.config_name | expr(Config.config_name) |
Runtime configuration
Once you have set up your configurations, you have to choose which configuration to use at runtime.
Interactive execution
To choose the configuration for interactive runs, open the Pipeline Settings and scroll to the Run Settings section. There, you can change the selected configuration.
Jobs
When you add a pipeline to your job, you can choose the configuration to use during the job. The configuration defaults can also be overridden here.
Subgraph configurations
Configurations can also be set inside subgraphs. These configurations will apply to execution that happens inside of the subgraph. While each type of subgraph might look different, the configuration settings should include:
- An area to define configurations. It should have a similar appearance to the pipeline configuration UI.
- An option to copy pipeline configurations.
Upon creation, subgraph configurations will also be included in the pipeline configurations.
Code
All configuration instances and values are automatically converted to code.
- Scala configuration code
- Python configuration code
- Open
Config.scala
in the<pipeline-path>/config
folder. - View the default configuration code.
- Find additional configurations that are packaged as JSON files in the
resources/config
folder.
- Open
Config.py
in the<pipeline>/config
folder. - View the default configuration code.
- Find additional configurations that are packaged as JSON files in the
configs/resources/config
folder.