Source & Target
Constitutes the set of Gems that help with loading and saving data.
File
A collection of Gems related to working with various file-based formats.
Name | Description |
---|---|
Avro | Avro format is a row-based storage format for Hadoop, which is widely used as a serialization platform. |
CSV | Allows you to read or write a delimited file (often called Comma Separated File, CSV). |
Delta | Reads data from Delta files present at a path and writes Delta files to a path based on configuration. |
Fixed Format | Read data from fixed format files with expected schema, or write data to fixed format files with expected schema. |
Iceberg | Reads data from Iceberg files present at a path and writes Iceberg files to a path based on configuration. |
JSON | Allows you to read or write a delimited file (often called Comma Separated File, CSV). |
Kafka | This source currently connects with Kafka Brokers in Batch mode. |
ORC | ORC (Optimized Row Columnar) is a columnar file format designed for Spark/Hadoop workloads. |
Parquet | Parquet is an open source file format built to handle flat columnar storage data formats. |
Text | This Gem allows you to read from or write to text file. |
XLSX (Excel) | Allows you to read or write Excel-compatible files. |
Warehouse
A collection of Gems specializing in connecting to warehouse-style data sources.
Name | Description |
---|---|
BigQuery | Allows you to read or write data to the BigQuery warehouse, using a high-performance connector. Enterprise only. |
CosmosDB | Allows you to read or write data to the CosmosDB database. |
DB2 | Allows you to read or write data to the DB2 warehouse, using a high-performance connector. Enterprise only. |
JDBC | Allows you to read or write data to the JDBC database. |
MongoDB | Allows you to read or write data to the MongoDB database. |
Oracle | Allows you to read or write data to the Oracle warehouse, using a high-performance connector. Enterprise only. |
Redshift | Allows you to read or write data to the Redshift warehouse, using a high-performance connector. Enterprise only. |
Salesforce | Allows you to read or write data to the Salesforce warehouse. |
Snowflake | Allows you to read or write data to the Snowflake warehouse, using a high-performance connector. Enterprise only. |
Teradata | Allows you to read or write data to the Teradata warehouse, using a high-performance connector. Enterprise only. |
Catalog
A collection of Gems related to working with various table-based formats.
Name | Description |
---|---|
Delta | Reads data from Delta tables saved in data catalog and writes data into Delta table in a managed Metastore. |
Hive | Read from or write to Tables managed by a Hive metastore. |
Lookup
Lookup is a special component that allows you to broadcast any data, to later be used anywhere in your Pipeline.
Synthetic Data Generator
If you don't have the data you need, try generating fake data. Using the Synthetic Data Generator Gem, you can specify columns with various datatypes and populate fields with randomly generated data. Specify the boundaries for each row, the percentage of rows which should have null values, etc. It's not real data but it's the next best thing!