The Medallion Architecture (also called Multi-Hop Architecture) is the most popular design pattern for organizing data in a Data Lakehouse. Popularized by Databricks, it divides your data pipeline into three distinct quality layers:
The Three Layers
🥉 Bronze Layer (Raw)
The landing zone for raw, unprocessed data:
- What goes here: Exact copies of source data — JSON payloads, CDC logs, CSV dumps
- Format: Usually stored as-is, with added metadata (ingestion timestamp, source system)
- Purpose: Single source of truth; you can always replay from bronze
- Example: Raw Salesforce API responses, Kafka event streams, database replication logs
🥈 Silver Layer (Cleansed)
Data that has been validated, deduplicated, and conformedData:
- What happens here: Schema enforcement, null handling, deduplication, type casting
- Format: Strongly typed, partitioned, stored in columnar format (Parquet/Delta)
- Purpose: Enterprise-wide "clean" data that is usable across teams
- Example: A unified
customers table combining Salesforce + Stripe + internal DB records🥇 Gold Layer (Business-Ready)
Aggregated, enriched data tailored for specific business use cases:
- What goes here: Business metrics, KPI tables, feature stores, ML training sets
- Format: Star/snowflake schemas optimized for BI tools
- Purpose: Powers dashboards, reports, and ML models directly
- Example:
daily_revenue_by_region, customer_lifetime_value, churn_predictionsWhy It Works
````
Source Systems → [Bronze] → [Silver] → [Gold] → BI / ML
Raw Clean Aggregated
Append Validated Business Logic
Replay Conformed Star Schema
Key Benefits
- Incremental Processing: Each layer only processes what changed
- Debuggability: When something breaks, trace it back through the layers
- Reusability: Silver layer serves multiple Gold-layer consumers
- Governance: Apply access controls at the appropriate layer
Medallion Architecture Tools
| Layer | Common Tools |
|-------|-------------|
| Bronze | Fivetran, Airbyte, Kafka Connect, AWS Glue |
| Silver | dbt, Spark, Snowflake Tasks, Dataform |
| Gold | dbt, Looker, Power BI, Tableau |
Anti-Patterns to Avoid
1. Skipping Silver: Going directly from Bronze to Gold creates fragile pipelines
2. Too Many Layers: Some teams add Platinum, Diamond — keep it simple
3. No Schema Enforcement: Silver should enforce schemas strictly
4. Ignoring Bronze Retention: Bronze is your backup; don't delete it too aggressively