Skip to main content

Amazon S3 gem

Use a Source and Target gem to read from or write to S3 locations in Prophecy pipelines. This page covers supported file formats, how to create the gem, and how to configure connection details and paths for both Source and Target gems.

Supported file formats

FormatReadWrite
CSVYesYes
Fixed widthYesNo
JSONYesYes
ParquetYesYes
XLSXYesYes
XMLYesYes

Create an S3 gem

To create an S3 Source or Target gem in your pipeline:

  1. Set up your S3 connection.

  2. Add a new Source or Target gem to your pipeline canvas and open the configuration.

  3. In the Type tab, select S3.

  4. In the Location tab, choose your file format and location.

    info

    For more information on how to configure this screen, jump to Source location and Target location.

  5. In the Properties tab, set the file properties. These vary based on the file type that you are working with.

    info

    See the list of properties per file type.

  6. (Source only) In the Preview tab, load a sample of the data and verify that it looks correct.

Source location

When setting up an S3 Source gem, you need to decide how Prophecy should locate files at runtime. There are two modes:

  • Filepath: Always read from a specific file path. You can use wildcards in your path definition. If multiple files match, they are unioned into a single output table.

  • Configuration: Dynamically read files provided by a file arrival/change trigger in the pipeline’s schedule. When new or updated files are detected in the monitored directory, the trigger starts the pipeline and passes those files to the S3 Source gem, which unions them into a single output table.

Use the table below to understand how to configure each option.

ParameterDescription
Format typeType of file to read, such as csv or json.
Select or create connectionSelect an existing S3 connection or create a new one.
Choose Path or ConfigurationChoose between the following options.
Filepath
Filepath option only
Path to the file in the S3 bucket. Supports wildcards.
Example: /temp/dir/*.csv
Select Configuration
Configuration option only
File arrival/change trigger configuration that provides the added or modified files for that run.
Include filename ColumnAppends a column containing the source filename for each row in the output table.
Delete files after successfully processedDeletes objects after they are successfully read.
Move files after successfully processedMoves objects to a specified directory after they are successfully read.

Configuration

If you select Configuration, the gem will only run successfully during a triggered pipeline run. This is because the gem expects files from the trigger. In other cases, such as an interactive run or an API-triggered run, there will be no files to read. In these situations, you will encounter the following error:

Failed due to: Unable to detect modified files for provided File Trigger

For the same reason, you'll see an error if you try to infer the schema in the Properties tab or load a preview in the Preview tab of the Source gem.

Target location

When setting up an S3 Target gem, you need to set the location and file type to correctly write the file.

ParameterDescription
Format typeType of file to write, such as csv or json.
Select or create connectionSelect an existing S3 connection or create a new one.
FilepathS3 path where the output file will be written.
Example: s3://my-bucket/data/orders.csv