Skip to main content

JSON

Read and write JSON formatted files

Source

Source Parameters

JSON Source supports all the available Spark read options for JSON.

The below list contains the additional parameters to read a JSON file:

ParameterDescriptionRequired
Dataset NameName of the DatasetTrue
LocationLocation of the file(s) to be loaded
E.g.: dbfs:/data/test.json
True
SchemaSchema to applied on the loaded data. Can be defined/edited as JSON or inferred using Infer Schema button.True

Example

Generated Code

def ReadDelta(spark: SparkSession) -> DataFrame:
return spark.read.format("json").load("dbfs:/FileStore/data/example.json")
object ReadJson {

def apply(spark: SparkSession): DataFrame =
spark.read
.format("json")
.load("dbfs:/FileStore/data/example.json")

}

Target

Target Parameters

JSON Target supports all the available Spark write options for JSON.

The below list contains the additional parameters to write a JSON file:

ParameterDescriptionRequired
Dataset NameName of the DatasetTrue
LocationLocation of the file(s) to be loaded
E.g.: dbfs:/data/output.json
True

Example

Generated Code

def write_json(spark: SparkSession, in0: DataFrame):
in0.write\
.format("json")\
.mode("overwrite")\
.save("dbfs:/data/test_output.json")
object write_json {
def apply(spark: SparkSession, in: DataFrame): Unit =
in.write
.format("json")
.mode("overwrite")
.save("dbfs:/data/test_output.json")
}

Producing a single output file

Because of Spark's distributed nature, output files are written as multiple separate partition files. If you need a single output file for some reason (such as reporting or exporting to an external system), use a Repartition Gem in Coalesce mode with 1 output partition:


caution

Note: This is not recommended for extremely large data sets as it may overwhelm the worker node writing the file.