πŸ“ Data Modeling

Star Schema

A data warehouse modeling technique that organizes data into a central fact table surrounded by dimension tables, optimized for fast analytical queries and BI reporting.

The Star Schema is the most widely used data modeling pattern in data warehousing. Named for its star-like shape, it places a central fact table (containing measurable events) surrounded by dimension tables (containing descriptive attributes).

Star Schema Structure

``
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ dim_date β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚dim_productβ”œβ”€β”€β†’ fact_sales ←─── dim_customer β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
β”‚ dim_store β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
`

Key Components

Fact Tables


Store measurable, quantitative data (events that happened):
-
fact_sales: revenue, quantity, discount
-
fact_orders: order_amount, shipping_cost
-
fact_web_events: page_views, session_duration

Types of facts:
- Additive: Can be summed across all dimensions (revenue)
- Semi-additive: Can be summed across some dimensions (account balance)
- Non-additive: Cannot be summed (ratios, percentages)

Dimension Tables


Store descriptive, categorical data (context for facts):
-
dim_customer: name, email, segment, region
-
dim_product: name, category, brand, price_tier
-
dim_date: date, month, quarter, year, is_holiday

Star Schema Example

`sql
-- "Total revenue by product category and quarter"
SELECT
d.quarter,
p.category,
SUM(f.revenue) AS total_revenue
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY d.quarter, p.category
ORDER BY d.quarter, total_revenue DESC;
``

Star Schema vs Snowflake Schema

| Aspect | Star Schema | Snowflake Schema |
|--------|------------|------------------|
| Dimension Tables | Denormalized (flat) | Normalized (split into sub-tables) |
| Query Performance | Faster (fewer joins) | Slower (more joins) |
| Storage | More (redundant data) | Less (normalized) |
| Complexity | Simple to understand | More complex relationships |
| Best For | BI/Analytics queries | Storage-constrained environments |

Best Practices

1. Use Surrogate Keys: Integer keys in facts, not natural keys
2. Denormalize Dimensions: Flatten hierarchies for query speed
3. Add a Date Dimension: Always include a dedicated date table
4. Keep Facts Narrow: Only keys and measures in fact tables
5. Use Conformed Dimensions: Share dimension tables across fact tables

Key Points

Frequently Asked Questions

What is a star schema in a data warehouse?

A star schema is a data modeling pattern where a central fact table (containing measurable data like sales amounts) is connected to multiple dimension tables (containing descriptive data like product names, dates, and customers) via foreign keys.

What is the difference between star schema and snowflake schema?

In a star schema, dimension tables are denormalized (flat). In a snowflake schema, dimensions are normalized into sub-tables. Star schema is faster for queries but uses more storage; snowflake schema saves storage but requires more joins.

When should I use a star schema?

Use a star schema when you need fast query performance for BI dashboards and reports. It's the default choice for most data warehouses (Snowflake, BigQuery, Redshift) and works well with tools like Looker, Tableau, and Power BI.

What is a fact table vs dimension table?

A fact table stores quantitative, measurable data (revenue, quantity, clicks). A dimension table stores descriptive, categorical data that provides context (product name, customer region, date). Facts are joined to dimensions via keys.

← Back to Glossary

Last updated: 2026-02-27

SR

Published by

Sainath Reddy

Data Engineer at Anblicks
🎯 4+ years experience