Skip to main content

JSON

The JSON (JavaScript Object Notation) file type:

  • Is human-readable, which simplifies how you debug and interact with data.
  • Has a flexible schema, which makes it easy to add or modify fields without changing the file format.

Parameters

ParameterTabDescription
LocationLocationFile path to read from or write to the JSON file.
SchemaPropertiesSchema to apply on the loaded data.
In the Source gem, you can define or edit the schema visually or in JSON code.
In the Target gem, you can view the schema visually or as JSON code.

Source

The Source gem reads data from JSON files and allows you to optionally specify the following additional properties.

Source properties

Property nameDescriptionDefault
DescriptionDescription of your dataset.None
Use user-defined schemaWhether to use the schema you define.true
Parse Multi-line recordsWhether to parse one record, which may span multiple lines, per file.
JSON built-in functions ignore this option.
false
New line separatorSets a separator for each line. The separator can be one or more characters.
JSON built-in functions ignore this option.
\r, \r\n and \n
Infer primitive values as string typeWhether to infer all primitive values as a String type.false
Infer floating-point values as decimal or double typeWhether to infer all floating-point values as a Decimal type.
If the value does not fit in Decimal, then the Source gem infers them as a Double.
false
Ignore Java/C++ style comment in Json recordsWhether to ignore Java and C++ style comments in JSON records.false
Allow unquoted field namesWhether to allow unquoted JSON field names.false
Allow single quotesWhether to allow single quotes in addition to double quotes.true
Allow leading zero in numbersWhether to allow leading zeros in numbers.false
Allow Backslash escapingWhether to accept quotes on all characters using the backslash quoting mechanism.false
Allow unquoted control characters in JSON stringWhether to allow unquoted control characters.false
Mode to deal with corrupt recordsHow to handle corrupt data. For a list of the possible values, see Supported corrupt record modes.PERMISSIVE
Column name of a corrupt recordName of the column to create for corrupt records._corrupt_records
Date Format StringSets the string that indicates a date format.yyyy-MM-dd
Timestamp Format StringSets the string that indicates a timestamp format.yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]
Sampling ratio for schema inferringDefines a fraction of rows to use for schema inferring.
CSV built-in functions ignore this option.
1.0
Ignore column with all null values during schema inferringWhether to ignore column of all null values or empty arrays during schema inference.false
Recursive File LookupWhether to recursively load files and disable partition inferring. If the data source explicitly specifies the partitionSpec when therecursiveFileLookup is true, the Source gem throws an exception.false

Supported corrupt record modes

ModeDescription
PERMISSIVEPut the malformed string into the corrupt records column, and set the malformed fields to null.
DROPMALFORMEDIgnore the entire corrupted record. This mode is not supported in the CSV built-in functions.
FAILFASTThrow an exception when it meets a corrupted record.

Example

Generated Code

tip

To see the generated source code of your project, switch to the Code view in the project header.

def ReadDelta(spark: SparkSession) -> DataFrame:
return spark.read.format("json").load("dbfs:/FileStore/data/example.json")

Target

The Target gem writes data to JSON files and allows you to optionally specify the following additional properties.

Target properties

Property nameDescriptionDefault
DescriptionDescription of your dataset.None
Line SeparatorDefines the line separator to use for parsing.\n
Write ModeHow to handle existing data. For a list of the possible values, see Supported write modes.error
Partition ColumnsList of columns to partition the JSON file by.None
Compression CodecCompression codec when writing to the JSON file.
The JSON file supports the following codecs: bzip2, gzip, lz4, snappy, and deflate.
JSON built-in functions ignore this option.
None
Date Format StringString that indicates a date format.yyyy-MM-dd
Timestamp Format StringString that indicates a timestamp format.yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]
EncodingSpecifies to encode (charset) saved json files.
JSON built-in functions ignore this option.
UTF-8
Ignore null fieldsWhether to ignore null fields when generating JSON objects.false

Supported write modes

Write modeDescription
errorIf the data already exists, throw an exception.
overwriteIf the data already exists, overwrite the data with the contents of the DataFrame.
appendIf the data already exists, append the contents of the DataFrame.
ignoreIf the data already exists, do nothing with the contents of the DataFrame.
This is similar to the CREATE TABLE IF NOT EXISTS clause in SQL.

Example

Generated Code

tip

To see the generated source code of your project, switch to the Code view in the project header.

def write_json(spark: SparkSession, in0: DataFrame):
in0.write\
.format("json")\
.mode("overwrite")\
.save("dbfs:/data/test_output.json")

Producing A Single Output File

caution

We do not recommended this for extremely large data sets because it may overwhelm the worker node writing the file.

Due to Spark's distributed nature, Prophecy writes output files as multiple separate partition files. If you want a single output file, such as reporting or exporting to an external system, use a Repartition gem in Coalesce mode with one output partition: