Text

Allows you to read or write plain Text files

Source

Reads data from Text files at the given Location.

Source Parameters

Parameter	Description	Required	Default
Location	File path where the Text files are located	True	None
Schema	Schema to be applied on the loaded data. Can be defined/edited as JSON or inferred using `Infer Schema` button	True	None
Recursive File Lookup	This is used to recursively load files from the given Location. Disables partition discovery. An exception will be thrown if this option and a `partitionSpec` are specified.	False	False
Line Separator	Defines the line separator that should be used for reading or writing.	False	`\r`, `\r\n`, `\n`
Read as a single row	If true, read each file from input path(s) as a single row.	False	False

Example

Generated Code

Python
Scala

def read_avro(spark: SparkSession) -> DataFrame:
    return spark.read\
        .format("text")\
        .text("dbfs:/FileStore/customers.txt", wholetext = False, lineSep = "\n")

object read_avro {

  def apply(spark: SparkSession): DataFrame =
    spark.read
        .format("text")
        .option("lineSep", "\n")
        .save("dbfs:/FileStore/customers.txt")

}

Target

Target Parameters

Write data as text files at the specified path.

Parameter	Description	Required	Default
Location	File path where text files will be written to	True	None
Compression	Compression codec to use when saving to file. This can be one of the known case-insensitive shorten names (`none`, `bzip2`, `gzip`, `lz4`, `snappy` and `deflate`).	False	None
Write Mode	How to handle existing data. See this table for a list of available options.	True	`error`
Partition Columns	List of columns to partition the Text files by	False	None
Line Separator	Defines the line separator that should be used for writing.	False	`\n`

info

The Text data source supports only a single column apart from the partition columns. An AnalysisException will be thrown if the DataFrame has more than 1 column apart from parition columns as the input DataFrame to the Target Gem.

Supported Write Modes

Write Mode	Description
overwrite	If data already exists, overwrite with the contents of the DataFrame
append	If data already exists, append the contents of the DataFrame
ignore	If data already exists, do nothing with the contents of the DataFrame. This is similar to a `CREATE TABLE IF NOT EXISTS` in SQL.
error	If data already exists, throw an exception.

Example

Generated Code

Python
Scala

def write_text(spark: SparkSession, in0: DataFrame):
    in0.write\
        .format("text")\
        .mode("overwrite")\
        .text("dbfs:/FileStore/customers.txt", compression = "gzip", lineSep = "\n")

object write_text {
  def apply(spark: SparkSession, in: DataFrame): Unit =
    in.write
      .format("text")
      .mode("overwrite")
      .option("compression", "gzip")
      .option("lineSep", "\n")
      .save("dbfs:/FileStore/customers.txt")
}

info

To know more about tweaking Text file related properties in Spark config click here.

Source​

Source Parameters​

Example​

Generated Code​

Target​

Target Parameters​

Supported Write Modes​

Example​

Generated Code​

Source

Source Parameters

Example

Generated Code

Target

Target Parameters

Supported Write Modes

Example

Generated Code