JSON
The JSON (JavaScript Object Notation) file type:
- Is human-readable, which simplifies how you debug and interact with data.
- Has a flexible schema, which makes it easy to add or modify fields without changing the file format.
Parameters
Parameter | Tab | Description |
---|---|---|
Location | Location | File path to read from or write to the JSON file. |
Schema | Properties | Schema to apply on the loaded data. In the Source gem, you can define or edit the schema visually or in JSON code. In the Target gem, you can view the schema visually or as JSON code. |
Source
The Source gem reads data from JSON files and allows you to optionally specify the following additional properties.
Source properties
Property name | Description | Default |
---|---|---|
Description | Description of your dataset. | None |
Use user-defined schema | Whether to use the schema you define. | true |
Parse Multi-line records | Whether to parse one record, which may span multiple lines, per file. JSON built-in functions ignore this option. | false |
New line separator | Sets a separator for each line. The separator can be one or more characters. JSON built-in functions ignore this option. | \r , \r\n and \n |
Infer primitive values as string type | Whether to infer all primitive values as a String type. | false |
Infer floating-point values as decimal or double type | Whether to infer all floating-point values as a Decimal type. If the value does not fit in Decimal , then the Source gem infers them as a Double . | false |
Ignore Java/C++ style comment in Json records | Whether to ignore Java and C++ style comments in JSON records. | false |
Allow unquoted field names | Whether to allow unquoted JSON field names. | false |
Allow single quotes | Whether to allow single quotes in addition to double quotes. | true |
Allow leading zero in numbers | Whether to allow leading zeros in numbers. | false |
Allow Backslash escaping | Whether to accept quotes on all characters using the backslash quoting mechanism. | false |
Allow unquoted control characters in JSON string | Whether to allow unquoted control characters. | false |
Mode to deal with corrupt records | How to handle corrupt data. For a list of the possible values, see Supported corrupt record modes. | PERMISSIVE |
Column name of a corrupt record | Name of the column to create for corrupt records. | _corrupt_records |
Date Format String | Sets the string that indicates a date format. | yyyy-MM-dd |
Timestamp Format String | Sets the string that indicates a timestamp format. | yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX] |
Sampling ratio for schema inferring | Defines a fraction of rows to use for schema inferring. CSV built-in functions ignore this option. | 1.0 |
Ignore column with all null values during schema inferring | Whether to ignore column of all null values or empty arrays during schema inference. | false |
Recursive File Lookup | Whether to recursively load files and disable partition inferring. If the data source explicitly specifies the partitionSpec when therecursiveFileLookup is true , the Source gem throws an exception. | false |
Supported corrupt record modes
Mode | Description |
---|---|
PERMISSIVE | Put the malformed string into the corrupt records column, and set the malformed fields to null. |
DROPMALFORMED | Ignore the entire corrupted record. This mode is not supported in the CSV built-in functions. |
FAILFAST | Throw an exception when it meets a corrupted record. |
Example
Generated Code
To see the generated source code of your project, switch to the Code view in the project header.
- Python
- Scala
def ReadDelta(spark: SparkSession) -> DataFrame:
return spark.read.format("json").load("dbfs:/FileStore/data/example.json")
object ReadJson {
def apply(spark: SparkSession): DataFrame =
spark.read
.format("json")
.load("dbfs:/FileStore/data/example.json")
}
Target
The Target gem writes data to JSON files and allows you to optionally specify the following additional properties.
Target properties
Property name | Description | Default |
---|---|---|
Description | Description of your dataset. | None |
Line Separator | Defines the line separator to use for parsing. | \n |
Write Mode | How to handle existing data. For a list of the possible values, see Supported write modes. | error |
Partition Columns | List of columns to partition the JSON file by. | None |
Compression Codec | Compression codec when writing to the JSON file. The JSON file supports the following codecs: bzip2 , gzip , lz4 , snappy , and deflate . JSON built-in functions ignore this option. | None |
Date Format String | String that indicates a date format. | yyyy-MM-dd |
Timestamp Format String | String that indicates a timestamp format. | yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX] |
Encoding | Specifies to encode (charset) saved json files. JSON built-in functions ignore this option. | UTF-8 |
Ignore null fields | Whether to ignore null fields when generating JSON objects. | false |
Supported write modes
Write mode | Description |
---|---|
error | If the data already exists, throw an exception. |
overwrite | If the data already exists, overwrite the data with the contents of the DataFrame . |
append | If the data already exists, append the contents of the DataFrame . |
ignore | If the data already exists, do nothing with the contents of the DataFrame . This is similar to the CREATE TABLE IF NOT EXISTS clause in SQL. |
Example
Generated Code
To see the generated source code of your project, switch to the Code view in the project header.
- Python
- Scala
def write_json(spark: SparkSession, in0: DataFrame):
in0.write\
.format("json")\
.mode("overwrite")\
.save("dbfs:/data/test_output.json")
object write_json {
def apply(spark: SparkSession, in: DataFrame): Unit =
in.write
.format("json")
.mode("overwrite")
.save("dbfs:/data/test_output.json")
}
Producing A Single Output File
We do not recommended this for extremely large data sets because it may overwhelm the worker node writing the file.
Due to Spark's distributed nature, Prophecy writes output files as multiple separate partition files. If you want a single output file, such as reporting or exporting to an external system, use a Repartition
gem in Coalesce
mode with one output partition: