Amazon S3 gem
Use a Source and Target gem to read from or write to S3 locations in Prophecy pipelines. This page covers supported file formats, how to create the gem, and how to configure connection details and paths for both Source and Target gems.
Supported file formats
Format | Read | Write |
---|---|---|
CSV | Yes | Yes |
Fixed width | Yes | No |
JSON | Yes | Yes |
Parquet | Yes | Yes |
XLSX | Yes | Yes |
XML | Yes | Yes |
Create an S3 gem
To create an S3 Source or Target gem in your pipeline:
-
Set up your S3 connection.
-
Add a new Source or Target gem to your pipeline canvas and open the configuration.
-
In the Type tab, select S3.
-
In the Location tab, choose your file format and location.
infoFor more information on how to configure this screen, jump to Source location and Target location.
-
In the Properties tab, set the file properties. These vary based on the file type that you are working with.
infoSee the list of properties per file type.
-
(Source only) In the Preview tab, load a sample of the data and verify that it looks correct.
Source location
When setting up an S3 Source gem, you need to decide how Prophecy should locate files at runtime. There are two modes:
-
Filepath: Always read from a specific file path. You can use wildcards in your path definition. If multiple files match, they are unioned into a single output table.
-
Configuration: Dynamically read files provided by a file arrival/change trigger in the pipeline’s schedule. When new or updated files are detected in the monitored directory, the trigger starts the pipeline and passes those files to the S3 Source gem, which unions them into a single output table.
Use the table below to understand how to configure each option.
Parameter | Description |
---|---|
Format type | Type of file to read, such as csv or json . |
Select or create connection | Select an existing S3 connection or create a new one. |
Choose Path or Configuration | Choose between the following options.
|
Filepath Filepath option only | Path to the file in the S3 bucket. Supports wildcards. Example: /temp/dir/*.csv |
Select Configuration Configuration option only | File arrival/change trigger configuration that provides the added or modified files for that run. |
Include filename Column | Appends a column containing the source filename for each row in the output table. |
Delete files after successfully processed | Deletes objects after they are successfully read. |
Move files after successfully processed | Moves objects to a specified directory after they are successfully read. |
Configuration
If you select Configuration, the gem will only run successfully during a triggered pipeline run. This is because the gem expects files from the trigger. In other cases, such as an interactive run or an API-triggered run, there will be no files to read. In these situations, you will encounter the following error:
Failed due to: Unable to detect modified files for provided File Trigger
For the same reason, you'll see an error if you try to infer the schema in the Properties tab or load a preview in the Preview tab of the Source gem.
Target location
When setting up an S3 Target gem, you need to set the location and file type to correctly write the file.
Parameter | Description |
---|---|
Format type | Type of file to write, such as csv or json . |
Select or create connection | Select an existing S3 connection or create a new one. |
Filepath | S3 path where the output file will be written. Example: s3://my-bucket/data/orders.csv |