JSON
Read and write JSON formatted files
Source
Source Parameters
JSON Source supports all the available Spark read options for JSON.
The below list contains the additional parameters to read a JSON file:
Parameter | Description | Required |
---|---|---|
Dataset Name | Name of the Dataset | True |
Location | Location of the file(s) to be loaded E.g.: dbfs:/data/test.json | True |
Schema | Schema to applied on the loaded data. Can be defined/edited as JSON or inferred using Infer Schema button. | True |
Example
Generated Code
def ReadDelta(spark: SparkSession) -> DataFrame:
return spark.read.format("json").load("dbfs:/FileStore/data/example.json")
object ReadJson {
def apply(spark: SparkSession): DataFrame =
spark.read
.format("json")
.load("dbfs:/FileStore/data/example.json")
}
Target
Target Parameters
JSON Target supports all the available Spark write options for JSON.
The below list contains the additional parameters to write a JSON file:
Parameter | Description | Required |
---|---|---|
Dataset Name | Name of the Dataset | True |
Location | Location of the file(s) to be loaded E.g.: dbfs:/data/output.json | True |
Example
Generated Code
def write_json(spark: SparkSession, in0: DataFrame):
in0.write\
.format("json")\
.mode("overwrite")\
.save("dbfs:/data/test_output.json")
object write_json {
def apply(spark: SparkSession, in: DataFrame): Unit =
in.write
.format("json")
.mode("overwrite")
.save("dbfs:/data/test_output.json")
}
Producing a single output file
Because of Spark's distributed nature, output files are written as multiple separate partition files. If you need a single output file for some reason (such as reporting or exporting to an external system), use a Repartition
Gem in Coalesce
mode with 1 output partition:
caution
Note: This is not recommended for extremely large data sets as it may overwhelm the worker node writing the file.