dbt Best Practices for Production Projects

IntermediateLast updated: 2026-04-09 • 4 sections

Battle-tested dbt best practices for project structure, model design, testing, performance, and CI/CD. Based on real-world production deployments.

Project Structure

Testing Strategy

Incremental Model Patterns

-- Standard incremental with merge
{{ config(
    materialized='incremental',
    unique_key='order_id',
    on_schema_change='sync_all_columns'
) }}

SELECT
    order_id,
    customer_id,
    order_total,
    updated_at
FROM {{ ref('stg_orders') }}

{% if is_incremental() %}
  WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}

-- Incremental with delete+insert (for partitioned data)
{{ config(
    materialized='incremental',
    incremental_strategy='delete+insert',
    unique_key='date_day'
) }}

SELECT
    date_day,
    SUM(revenue) AS total_revenue,
    COUNT(DISTINCT user_id) AS unique_users
FROM {{ ref('int_daily_events') }}
{% if is_incremental() %}
  WHERE date_day >= DATEADD('day', -3, CURRENT_DATE)
{% endif %}
GROUP BY date_day

Performance & Deployment

Frequently Asked Questions

Should I use dbt Core or dbt Cloud?

dbt Core is free, open-source, and gives you full control — ideal if you have engineering resources to manage CI/CD, scheduling, and infrastructure. dbt Cloud adds a managed IDE, job scheduler, documentation hosting, and built-in CI. Use dbt Cloud if you want faster onboarding and less infrastructure management. Many teams use Core in CI/CD pipelines and Cloud for development.

How do I handle late-arriving data in incremental models?

Use a lookback window: WHERE updated_at > DATEADD('day', -3, (SELECT MAX(updated_at) FROM {{ this }})). This reprocesses the last 3 days of data on every run, catching late arrivals. Combine with unique_key for proper merge behavior. The tradeoff is slightly higher compute cost for better data accuracy.

What is the difference between ref() and source()?

ref() references another dbt model — dbt handles the database/schema resolution and builds a dependency graph. source() references a raw table defined in sources.yml — it doesn't create a dependency on another model, but enables lineage tracking and freshness monitoring. Rule: use source() for the first layer (staging), ref() everywhere else.

Related Cheat Sheets

dbt Commands Cheat SheetSnowflake Best Practices for Data EngineersApache Airflow Best Practices for Production
← All Cheat Sheets