🏢 Data Warehousing

Medallion Architecture

A data design pattern that organizes data into three layers — Bronze (raw), Silver (cleaned), and Gold (business-ready) — to progressively improve data quality in a lakehouse.

The Medallion Architecture (also called Multi-Hop Architecture) is the most popular design pattern for organizing data in a Data Lakehouse. Popularized by Databricks, it divides your data pipeline into three distinct quality layers:

The Three Layers

🥉 Bronze Layer (Raw)


The landing zone for raw, unprocessed data:
- What goes here: Exact copies of source data — JSON payloads, CDC logs, CSV dumps
- Format: Usually stored as-is, with added metadata (ingestion timestamp, source system)
- Purpose: Single source of truth; you can always replay from bronze
- Example: Raw Salesforce API responses, Kafka event streams, database replication logs

🥈 Silver Layer (Cleansed)


Data that has been validated, deduplicated, and conformedData:
- What happens here: Schema enforcement, null handling, deduplication, type casting
- Format: Strongly typed, partitioned, stored in columnar format (Parquet/Delta)
- Purpose: Enterprise-wide "clean" data that is usable across teams
- Example: A unified customers table combining Salesforce + Stripe + internal DB records

🥇 Gold Layer (Business-Ready)


Aggregated, enriched data tailored for specific business use cases:
- What goes here: Business metrics, KPI tables, feature stores, ML training sets
- Format: Star/snowflake schemas optimized for BI tools
- Purpose: Powers dashboards, reports, and ML models directly
- Example: daily_revenue_by_region, customer_lifetime_value, churn_predictions

Why It Works

``
Source Systems → [Bronze] → [Silver] → [Gold] → BI / ML
Raw Clean Aggregated
Append Validated Business Logic
Replay Conformed Star Schema
``

Key Benefits

- Incremental Processing: Each layer only processes what changed
- Debuggability: When something breaks, trace it back through the layers
- Reusability: Silver layer serves multiple Gold-layer consumers
- Governance: Apply access controls at the appropriate layer

Medallion Architecture Tools

| Layer | Common Tools |
|-------|-------------|
| Bronze | Fivetran, Airbyte, Kafka Connect, AWS Glue |
| Silver | dbt, Spark, Snowflake Tasks, Dataform |
| Gold | dbt, Looker, Power BI, Tableau |

Anti-Patterns to Avoid

1. Skipping Silver: Going directly from Bronze to Gold creates fragile pipelines
2. Too Many Layers: Some teams add Platinum, Diamond — keep it simple
3. No Schema Enforcement: Silver should enforce schemas strictly
4. Ignoring Bronze Retention: Bronze is your backup; don't delete it too aggressively

Key Points

Frequently Asked Questions

What is the Medallion Architecture?

The Medallion Architecture is a data design pattern that organizes data into three progressive quality layers — Bronze (raw), Silver (cleansed), and Gold (business-ready) — in a lakehouse environment.

What is the difference between Bronze Silver and Gold layers?

Bronze stores raw unprocessed data exactly as ingested. Silver cleans, validates, and deduplicates that data. Gold applies business logic to create aggregated, analytics-ready datasets.

Is Medallion Architecture only for Databricks?

No. While Databricks popularized it, the Medallion Architecture works on any platform — Snowflake, BigQuery, Azure Synapse, or open-source tools like dbt + Apache Iceberg.

Can I use Medallion Architecture with Snowflake?

Yes. You can implement it using Snowflake databases or schemas for each layer (e.g., RAW_DB, CLEAN_DB, ANALYTICS_DB) and use dbt or Snowflake Tasks for transformations.

← Back to Glossary

Last updated: 2026-02-27

SR

Published by

Sainath Reddy

Data Engineer at Anblicks
🎯 4+ years experience