Source component enables you to read data from various input source for batch ETL workflows. It supports reading from vaiour input sources

  • File
    • Stored in cloud storage
  • Stored as basic formats: Json, Text, Csv
  • Stored as big data formats: Parquet, Orc, Delta
  • Stored as legacy formats: fixed formats - ebcdic, cobol
  • Database
    • Stored in any JDBC compatible format
  • Catalog Table
    • Stored in a table registered with the Hive/Spark Metastore or AWS Glue
Note: You must create a Prophecy Dataset pointing to your underlying data to use it. You do this from within the source component.

Next: Files