Skip to main content

CSV

Read or write delimited files such as CSV (Comma-separated Values) or TSV (Tab-separated Values) in Prophecy.

Source

Source Parameters

CSV Source supports all the available Spark read options for CSV.

The below list contains the additional parameters to read a CSV file:

ParameterDescription
Dataset NameName of the dataset.
LocationLocation of the file to be loaded. You can read from a file location, Sharepoint (Python only), or SFTP (Python only).
SchemaSchema applied to the loaded data. Schema can be defined/edited as JSON or inferred using Infer Schema button.

Target

Target Parameters

CSV Target supports all the available Spark write options for CSV.

The below list contains the additional parameters to write a CSV file:

ParameterDescriptionRequired
Dataset NameName of the dataset.True
LocationLocation of the file(s) to be loaded. For example, dbfs:/data/output.csv.True
Write ModeHow to handle existing data. See this table for a list of available options.False

Supported Write Modes

Write ModeDescription
overwriteIf data already exists, overwrite with the contents of the DataFrame.
appendIf data already exists, append the contents of the DataFrame.
ignoreIf data already exists, do nothing with the contents of the DataFrame. This is similar to a CREATE TABLE IF NOT EXISTS in SQL.
errorIf data already exists, throw an exception.

Produce a single output file

Because of Spark's distributed nature, output files are written as multiple separate partition files by default. If you require a single output file, you can add and enable the Create single CSV file property in the Properties tab of the Target gem.