Data modeling is the process of creating a visual representation of how data is structured, stored, and accessed in a system. It defines entities, attributes, relationships, and constraints—serving as the blueprint for databases and data warehouses.
Types of Data Models
1. Conceptual Model
High-level business view:
- Entities and relationships
- No technical details
- Understood by business stakeholders
2. Logical Model
Detailed structure without platform specifics:
- Attributes and data types
- Primary and foreign keys
- Normalization rules
3. Physical Model
Implementation-ready design:
- Table definitions with exact data types
- Indexes and partitions
- Platform-specific optimizations
Data Modeling Techniques
Dimensional Modeling (Kimball)
Optimized for analytics and BI:
- Fact Tables: Metrics and measurements
- Dimension Tables: Descriptive attributes
- Star Schema: Fact table surrounded by dimensions
- Snowflake Schema: Normalized dimensions
Data Vault
Scalable, auditable modeling:
- Hubs: Business keys
- Links: Relationships
- Satellites: Descriptive data with history
One Big Table (OBT)
Modern, denormalized approach:
- Single wide table with all data
- Optimized for cloud warehouses
- Simple, fast queries
Best Practices
1. Start with Business Questions: Model for how data will be used
2. Document Everything: Maintain a data dictionary
3. Think About Change: Design for schema evolution
4. Test with Real Queries: Validate performance early
5. Use dbt for Implementation: Transform raw to modeled data
Common Mistakes
- Over-normalization for analytical workloads
- Ignoring query performance during design
- Not involving business stakeholders
- Skipping documentation