Skip to main content

Source & Target

Constitutes the set of Gems that help with loading and saving data.

File

A collection of Gems related to working with various file-based formats.

NameDescription
AvroAvro format is a row-based storage format for Hadoop, which is widely used as a serialization platform.
CSVAllows you to read or write a delimited file (often called Comma Separated File, CSV).
DeltaReads data from Delta files present at a path and writes Delta files to a path based on configuration.
Fixed FormatRead data from fixed format files with expected schema, or write data to fixed format files with expected schema.
IcebergReads data from Iceberg files present at a path and writes Iceberg files to a path based on configuration.
JSONAllows you to read or write a delimited file (often called Comma Separated File, CSV).
KafkaThis source currently connects with Kafka Brokers in Batch mode.
ORCORC (Optimized Row Columnar) is a columnar file format designed for Spark/Hadoop workloads.
ParquetParquet is an open source file format built to handle flat columnar storage data formats.
TextThis Gem allows you to read from or write to text file.
XLSX (Excel)Allows you to read or write Excel-compatible files.

Warehouse

A collection of Gems specializing in connecting to warehouse-style data sources.

NameDescription
BigQueryAllows you to read or write data to the BigQuery warehouse, using a high-performance connector. Enterprise only.
CosmosDBAllows you to read or write data to the CosmosDB database.
DB2Allows you to read or write data to the DB2 warehouse, using a high-performance connector. Enterprise only.
JDBCAllows you to read or write data to the JDBC database.
MongoDBAllows you to read or write data to the MongoDB database.
OracleAllows you to read or write data to the Oracle warehouse, using a high-performance connector. Enterprise only.
RedshiftAllows you to read or write data to the Redshift warehouse, using a high-performance connector. Enterprise only.
SalesforceAllows you to read or write data to the Salesforce warehouse.
SnowflakeAllows you to read or write data to the Snowflake warehouse, using a high-performance connector. Enterprise only.
TeradataAllows you to read or write data to the Teradata warehouse, using a high-performance connector. Enterprise only.

Catalog

A collection of Gems related to working with various table-based formats.

NameDescription
DeltaReads data from Delta tables saved in data catalog and writes data into Delta table in a managed Metastore.
HiveRead from or write to Tables managed by a Hive metastore.

Lookup

Lookup is a special component that allows you to broadcast any data, to later be used anywhere in your Pipeline.

Synthetic Data Generator

If you don't have the data you need, try generating fake data. Using the Synthetic Data Generator Gem, you can specify columns with various datatypes and populate fields with randomly generated data. Specify the boundaries for each row, the percentage of rows which should have null values, etc. It's not real data but it's the next best thing!