Skip to main content

Configuration

Allows you to define configurations to control various aspects of your pipeline. Prophecy IDE allows you to define three kinds of configurations:

  1. Pipeline Configuration: name-value pair per fabric which can then be accessed in pipeline as ${name}.
    Eg: for Fabric = dev, SOURCE_PATH: dbfs:/dev/file.csv,
    for Fabric = prod, SOURCE_PATH: dbfs:/prod/file.csv

    note

    Each name-value pair has to be first defined in the Common tab. This can then be overridden in the individual fabric tabs. Configurations - Common

  2. Spark Configuration: Run time spark configurations as name-value pairs.

    note

    The name-value pairs will be set inside the spark runtime configurations as spark.conf.set(name, value) Configurations - Spark This will be compiled as:

    spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "10485760")
  3. Hadoop Configuration: Hadoop configurations as name-value paris.

    note

    The name-value pairs will be set inside the hadoop configuration as spark.sparkContext.hadoopConfiguration.set(name, value) Configurations - Spark This will be compiled as:

    spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "my_access_key")

Examples


Dynamic Data Load Using Workflow Configurations

In this example, we'll see how we can configure different source file paths for different execution environments. We have two fabrics available for our pipeline viz. DEV and PROD

Step 1 - Open Config window

The configuration is stored in the resources and is parsed by the ConfigStore to be usable in other parts of the code as Config.SOURCE_PATH. The resolution of the config will be done at the run-time according to the running fabric. Configurations - Resource