Datasets
In Prophecy, datasets are grouped by projects and rely on the following:
- Schema: The structure or shape of the data, including column names, data types, and the method for reading and writing the data in this format.
- Fabric: The execution environment in which the data resides.
Create datasets
Datasets are created where they are first used in a Source or Target gems. A dataset definition includes its:
- Type: The type of data you are reading/writing like CSV, Parquet files or catalog tables.
- Location: The location of your data. It could be a file path for CSV or a table name.
- Properties: Properties consists of Schema and some other attributes specific to the file format. For example, in case of CSV, you can give Column delimiter in additional attributes. You can also define Metadata for each column here like description, tags, and mappings.
Datasets can be used by any pipeline within the same project, and in some cases by other projects within the same team.