Skip to main content

Text

Allows you to read or write plain Text files

Source

Reads data from Text files at the given Location.

Source Parameters

ParameterDescriptionRequiredDefault
LocationFile path where the Text files are locatedTrueNone
SchemaSchema to be applied on the loaded data. Can be defined/edited as JSON or inferred using Infer Schema buttonTrueNone
Recursive File LookupThis is used to recursively load files from the given Location. Disables partition discovery. An exception will be thrown if this option and a partitionSpec are specified.FalseFalse
Line SeparatorDefines the line separator that should be used for reading or writing.False\r, \r\n, \n
Read as a single rowIf true, read each file from input path(s) as a single row.FalseFalse

Example

Generated Code

def read_avro(spark: SparkSession) -> DataFrame:
return spark.read\
.format("text")\
.text("dbfs:/FileStore/customers.txt", wholetext = False, lineSep = "\n")


Target

Target Parameters

Write data as text files at the specified path.

ParameterDescriptionRequiredDefault
LocationFile path where text files will be written toTrueNone
CompressionCompression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, bzip2, gzip, lz4, snappy and deflate).FalseNone
Write ModeHow to handle existing data. See this table for a list of available options.Trueerror
Partition ColumnsList of columns to partition the Text files byFalseNone
Line SeparatorDefines the line separator that should be used for writing.False\n
info

The Text data source supports only a single column apart from the partition columns. An AnalysisException will be thrown if the DataFrame has more than 1 column apart from parition columns as the input DataFrame to the Target Gem.

Supported Write Modes

Write ModeDescription
overwriteIf data already exists, overwrite with the contents of the DataFrame
appendIf data already exists, append the contents of the DataFrame
ignoreIf data already exists, do nothing with the contents of the DataFrame. This is similar to a CREATE TABLE IF NOT EXISTS in SQL.
errorIf data already exists, throw an exception.

Example

Generated Code

def write_text(spark: SparkSession, in0: DataFrame):
in0.write\
.format("text")\
.mode("overwrite")\
.text("dbfs:/FileStore/customers.txt", compression = "gzip", lineSep = "\n")
info

To know more about tweaking Text file related properties in Spark config click here.