🛡️ Data Governance

Data Lineage

The documentation and visualization of data as it flows from source to destination, showing transformations, dependencies, and ownership at each step.

Data lineage is the complete lifecycle of data from origin to consumption. It documents where data comes from, how it moves through systems, what transformations are applied, and what depends on it. Think of it as a visual map of your data's journey.

Types of Data Lineage

1. Technical Lineage: Column-to-column mappings, SQL transformations
2. Business Lineage: High-level flow between business concepts
3. Operational Lineage: Runtime execution details and timing

Why Data Lineage Matters

- Impact Analysis: Know what breaks if you change a source
- Root Cause Analysis: Trace data issues to their origin
- Compliance: Demonstrate data handling for audits
- Trust: Understand where dashboard numbers come from
- Migration: Plan system changes with confidence

Data Lineage Components

- Source: Where data originates (database, API, file)
- Transformation: How data is modified (joins, aggregations)
- Destination: Where data lands (warehouse, dashboard)
- Dependencies: What downstream systems rely on this data
- Metadata: Column names, types, descriptions

Lineage Capture Methods

| Method | Pros | Cons |
|--------|------|------|
| SQL Parsing | Accurate, automatic | Complex to implement |
| API Integration | Real-time | Vendor-specific |
| Manual Documentation | Flexible | Outdated quickly |
| Log Analysis | Runtime truth | Incomplete picture |

Data Lineage Tools

- Atlan: Active metadata platform with lineage
- Alation: Data catalog with lineage
- OpenLineage: Open standard for lineage
- dbt: Built-in lineage for SQL models
- DataHub: Open-source metadata platform

Key Points

Frequently Asked Questions

What is data lineage in simple terms?

Data lineage is a map showing where your data comes from, how it changes as it moves through systems, and where it ends up. It answers "where did this number in my dashboard come from?"

Why is data lineage important?

Data lineage helps you understand data origin for trust, trace issues to their source for debugging, assess impact of changes, and demonstrate compliance for audits. Without lineage, you are flying blind.

How do you implement data lineage?

Data lineage can be implemented through SQL parsing tools, data catalog platforms like Atlan or Alation, open standards like OpenLineage, or tools like dbt that track lineage automatically for transformations.

What is the difference between data lineage and data provenance?

Data lineage shows the flow and transformation of data. Data provenance focuses on the origin and history of a specific data point, including who created it and when. Provenance is a subset of lineage.

← Back to Glossary

Last updated: 2026-01-21

SR

Published by

Sainath Reddy

Data Engineer at Anblicks
🎯 4+ years experience