Skip to main content


ORC (Optimized Row Columnar) is a columnar file format designed for Spark/Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Because ORC files are type-aware, the writer chooses the most appropriate encoding for the type and builds an internal index as the file is written.

This Gem allows you to read from or write to ORC files.


Reads data from ORC files present at a path.

Source Parameters

LocationFile path where ORC files are presentTrueNone
SchemaSchema to be applied on the loaded data. Can be defined/edited as JSON or inferred using Infer Schema buttonTrueNone
Recursive File LookupThis is used to recursively load files and it disables partition inferring. Its default value is false. If data source explicitly specifies the partitionSpec when recursiveFileLookup is true, an exception will be thrown.FalseFalse


ORC source example

Generated Code

def read_orc(spark: SparkSession) -> DataFrame:


Target Parameters

Write data as ORC files at the specified path.

LocationFile path where ORC files will be writtenTrueNone
CompressionCompression codec to use when saving to file. This can be one of the known case-insensitive shorten names (none, uncompressed, snappy, gzip, lzo, brotli, lz4, and zstd). This will override orc.compressFalsesnappy
Write ModeWrite mode for DataFrameTrueerror
Partition ColumnsList of columns to partition the ORC files byFalseNone


ORC target example

Generated Code

def write_orc(spark: SparkSession, in0: DataFrame):

To know more about tweaking orc related properties in Spark config click here.