Text
ProphecySparkBasicsPython0.0.1+ProphecySparkBasicsScala0.0.1+Databricks UC Single Cluster14.3+Databricks UC Shared14.3+Livy3.2.0+
The Text file type is:
- Easy to read from, write to, and share.
- Compatible with many programs, and easy to exchange data.
Parameters
Parameter | Tab | Description |
---|---|---|
Location | Location | File path to read from or write to the Text file. |
Schema | Properties | Schema to apply on the loaded data. In the Source gem, you can define or edit the schema visually or in JSON code. In the Target gem, you can view the schema visually or as JSON code. |
Source
The Source gem reads data from Text files and allows you to optionally specify the following additional properties.
Source properties
Property name | Description | Default |
---|---|---|
Description | Description of your dataset. | None |
Enforce schema | Whether to use the schema you define. | true |
Read file as single row | Whether to read each file from input path as a single row. | false |
Line Separator | Sets a separator for each field and value. The separator can be one or more characters. | \r , \r\n , and \n |
Recursive File Lookup | Whether to recursively load files and disable partition inferring. If the data source explicitly specifies the partitionSpec when therecursiveFileLookup is true , the Source gem throws an exception. | false |
Example
Generated Code
tip
To see the generated source code of your project, switch to the Code view in the project header.
- Python
- Scala
def read_avro(spark: SparkSession) -> DataFrame:
return spark.read\
.format("text")\
.text("dbfs:/FileStore/customers.txt", wholetext = False, lineSep = "\n")
object read_avro {
def apply(spark: SparkSession): DataFrame =
spark.read
.format("text")
.option("lineSep", "\n")
.save("dbfs:/FileStore/customers.txt")
}
Target
The Target gem writes data to Text files and allows you to optionally specify the following additional properties.
Target properties
Property name | Description | Default |
---|---|---|
Description | Description of your dataset. | None |
Write Mode | How to handle existing data. For a list of the possible values, see Supported write modes. | error |
Partition Columns | List of columns to partition the Text files by. The Text file type only supports a single column apart from the partition columns. If the DataFrame contains more than one column apart from partition columns as the input DataFrame , the Target gem throws an AnalysisException error. | None |
Compression Codec | Compression codec when writing to the Text file. The Text file supports the following codecs: none , bzip2 , gzip , lz4 , snappy and deflate . | None |
Line Separator | Defines the line separator to use for parsing. | \n |
Supported write modes
Write mode | Description |
---|---|
error | If the data already exists, throw an exception. |
overwrite | If the data already exists, overwrite the data with the contents of the DataFrame . |
append | If the data already exists, append the contents of the DataFrame . |
ignore | If the data already exists, do nothing with the contents of the DataFrame . This is similar to the CREATE TABLE IF NOT EXISTS clause in SQL. |
Example
Generated Code
tip
To see the generated source code of your project, switch to the Code view in the project header.
- Python
- Scala
def write_text(spark: SparkSession, in0: DataFrame):
in0.write\
.format("text")\
.mode("overwrite")\
.text("dbfs:/FileStore/customers.txt", compression = "gzip", lineSep = "\n")
object write_text {
def apply(spark: SparkSession, in: DataFrame): Unit =
in.write
.format("text")
.mode("overwrite")
.option("compression", "gzip")
.option("lineSep", "\n")
.save("dbfs:/FileStore/customers.txt")
}