What is Lineage?
Lineage tell us about the life cycle of data. Data lineage is the process of understanding, recording, and visualizing data as it flows from source to destination. This includes all transformations the data underwent along the way.
Why is it important?
Just knowing the source of a particular data set is not always enough to understand its importance, perform error resolution, understand process changes, and perform system migrations and updates.
Knowing how data is updated and using what transformations, improves overall data quality. Also, it allows data custodians to ensure the integrity and confidentiality of data is protected throughout its lifecycle.
Data lineage allows companies to:
- Track errors in data processes.
- Improve overall data quality.
- Implement process changes and system migrations with lower risk and more confidence.
- Combine data discovery with a comprehensive view of metadata.
- Improve overall data governance.
There are two ways to get to the lineage view:
- Directly from
Metadataby clicking on button as shown in image below.
- Using the Lineage Search option from the left side pane.
Lineage is always computed on-demand directly on the Git code. Therefore, you can do experimental changes in branch and see how it will affect the overall lineage and rectify errors if any.
Browse Datasets option from right-hand side can be used to search/select the column/entity for which lineage needs to be computed.
Zoom-in toggle for a particular Pipeline/Dataset can be used to better understand Pipeline/Dataset.
Pipeline Zoom-In View
This shows code-level info around all the components present in the Pipeline.
Please select a particular component aas shown below to get the code-level view.
Dataset Zoom-In View
This view gives info around all the upstream and downstream transformations if any for all the columns of the selected Dataset.
This option can be used to Search Lineage by Column, Dataset or Pipeline.
Below filters are available to narrow down the search results:
Type- Filter by Datasets, Pipelines or Columns
Project- Filter by Project Name
Author- Filter by Project Author
Last Modfified- Filter by Last Modified Time