dbt state: Skip Unchanged Nodes, Cut Warehouse Compute 30%

You have a 400-model dbt project. A junior analyst tweaks one source definition. Every. Single. Model. Rebuilds.

You have a 400-model dbt project. A junior analyst tweaks one source definition. Every. Single. Model. Rebuilds. Ninety minutes later, you’ve burned $1,200 in warehouse compute for a change that affected nothing downstream. That’s the dbt default. It’s also the most expensive habit in the modern data stack.

dbt’s State feature is the principled answer: compare your current project against a previously saved manifest, identify what has actually changed, and run only those models. No guessing. No manual orchestration. No fear.

The feature has been in dbt Core since v0.18, but most teams don’t use it — because manifest management felt clunky. With dbt Cloud, it’s now automatic. With dbt Core, it’s a straightforward S3 upload. And the returns are stark: 60–90% reduction in CI runtime, 30% warehouse compute savings, and developer feedback loops that feel instant instead of hourly.

TL;DR

→ dbt State compares your current project against a saved manifest (JSON) to identify changed models. Run only what’s different.

→ Core selector: `dbt run –select state:modified+ –state ./prod-artifacts`. The `+` rebuilds downstream dependents too.

→ Variants available: `state:modified.body` (SQL changed), `state:modified.configs` (config changed), `state:new` (newly added models).

→ Combine with `–defer` to resolve unchanged upstream models to production instead of rebuilding them. Game-changer for dev workflows.

→ Real runtime savings: 400-model project, CI goes from 48 min (full rebuild) to 6 min (3 changed models with dependents). 87.5% faster.

→ Setup: Persist production manifest to S3/GCS after every run. Download it in CI, add two flags. Takes 20 minutes to wire up.

→ dbt Cloud does this automatically. dbt Core requires DIY manifest management (simple, but manual).

→ Gotcha: Source freshness changes don’t trigger model runs. New columns on upstream models won’t flag downstream models unless explicitly selected.

→ One principle: Compare, don’t guess. The manifest is the source of truth for what changed.

The Problem with Running Everything

Most analytics teams start with small dbt projects. A few models, a `dbt run`, done in seconds. Then the project grows. Hundreds of models. Dozens of sources. Complex DAGs spanning raw ingestion to business-critical marts. Suddenly `dbt run` takes 45 minutes — and you’re running it ten times a day in CI.

The naive solution: run only the models you touched. But doing this manually is error-prone. You forget an upstream dependency. A downstream mart goes stale. You ship broken data. Teams end up caught between speed and correctness, and neither option feels good.

dbt State solves this: automatically identify what changed, run only those models (plus downstream dependents), skip everything else. No manual selection. No guessing. Safe by default.

What dbt State Actually Is

dbt State compares manifests: current vs. prior. Changes detected → rebuild. No changes → skip.

dbt State is the mechanism by which dbt compares your current project against a previously compiled artifact — specifically the manifest.json file — to determine what has actually changed.

The manifest is a JSON file that dbt generates on every `dbt compile` or `dbt run`. It captures a complete snapshot of your project at a point in time: model definitions, compiled SQL, configurations, tests, sources, and the relationships between them.

By diffing the current manifest against a prior one, dbt can identify:

• Models whose SQL has changed
• Models whose configuration has changed (e.g., materialized, tags, meta)
• Models whose upstream dependencies have changed
• New models that didn’t exist before
• Models whose schema or source freshness has changed
• Models that call a macro that has changed

Everything else is left alone.

The Core Selector: `state:modified`

The entry point to dbt State is the `state:modified` node selector. It filters your run to only nodes that have changed relative to a saved state:

dbt run --select state:modified --state ./prod-artifacts

Here `./prod-artifacts` is a directory containing the `manifest.json` from your last production run. dbt compares every node in your current project against that manifest and runs only what’s different.

Selector Variants

dbt ships several variants of the selector for fine-grained control:

state:modified — All nodes with any change (SQL, config, schema)
state:modified.body — Only models where the SQL body changed
state:modified.configs — Only nodes where configuration changed
state:modified.persisted_descriptions — Column descriptions changed
state:modified.relation — Relation name or schema changed
state:modified.macros — An upstream macro changed (impacts compiled SQL)
state:new — Entirely new models (didn’t exist in saved state)

The most common pattern combines `state:new` and `state:modified` to catch everything relevant:

dbt run --select state:new,state:modified+ --state ./prod-artifacts

The trailing `+` means: run all modified nodes and everything downstream of them. This ensures referential integrity — if stg_orders changes, every mart that joins on it will also rebuild.

Whether to use `+` depends on your setup:

Incremental tables downstream: Often safe to skip, since they’ll pick up new rows on the next run anyway.
Full-refresh tables or views downstream: Should be rebuilt if their upstream changes.
Critical reporting models: Should probably always be included for safety.

Most teams use `state:modified+` as the default and carve out exceptions for incremental models.

Real-World Runtime Savings

Cost comparison: Without dbt State (rebuild every model every run: 500 models × 24 hourly runs = 12,000 rebuilds/day = $5,200/month). With dbt State (average 35% fewer models rebuilt, 9% compute efficiency = $4,420/month). Monthly savings: $780. Annual: $9,360.

Runtime reduction depends on project shape, but 60–90% is typical for mature projects.

How much time you save depends on the shape of your project, but the pattern is consistent: most runs in a mature dbt project touch a small fraction of the total model count.

Consider a 400-model project:

Full `dbt run` (no state): 400 models built = 48 minutes
PR touches 3 models: ~20 models run (with `+`) = 6 minutes (87.5% faster)
Hotfix to 1 model: ~8 models run (with `+`) = 2 minutes (95.8% faster)
Daily incremental run: ~15 models run = 4 minutes (91.7% faster)

For large, mature projects, you regularly see 70–90% reductions in CI runtime once state selection is in place.

Setting It Up in CI/CD

The real power of dbt State emerges in CI/CD pipelines. The pattern is:

1. After every successful production run, upload the manifest.json to a persistent store (S3, GCS, Azure Blob, or an artifact registry).
2. In CI, download the latest production manifest before running dbt.
3. Run dbt with state:modified+ against that manifest.

GitHub Actions Example

jobs:
dbt-ci:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Download production manifest
run: |
aws s3 cp s3://your-bucket/prod/manifest.json ./prod-artifacts/manifest.json

- name: Install dbt
run: pip install dbt-core dbt-snowflake

- name: Run modified models only
run: |
dbt run \
--select state:new,state:modified+ \
--state ./prod-artifacts \
--target ci

After Production Run: Upload the Manifest

- name: Run dbt production
run: dbt run --target prod

- name: Upload manifest to S3
run: |
aws s3 cp ./target/manifest.json s3://your-bucket/prod/manifest.json

This creates a feedback loop: every successful production run produces the baseline for the next CI comparison.

How dbt Computes “Modified”

Understanding what triggers a `state:modified` match helps you trust the selector and avoid surprises.

dbt computes a content hash for each node in the manifest. The hash covers the compiled SQL (after Jinja rendering), the node’s configuration block, and for sources, the freshness configuration.

If the hash changes between manifests, the node is considered modified. This means:

Whitespace changes in SQL do NOT trigger a rebuild (dbt normalizes whitespace before hashing).
Comment changes alone do NOT trigger a rebuild.
Jinja logic changes that produce different compiled SQL DO trigger a rebuild.
Macro changes propagate: if a macro used by a model changes, the model’s compiled SQL will differ, and it will be flagged as modified.

This is conservative and safe — you might rebuild more than strictly necessary, but you won’t accidentally skip a model that needs to run.

Combining State with `–defer`

--defer is a closely related feature that pairs naturally with --state. While state:modified controls what you run, --defer controls where dbt looks for relations that you aren’t running.

dbt run \
--select state:new,state:modified+ \
--state ./prod-artifacts \
--defer \
--target dev

With --defer, when model A references model B and B is not being run (because it’s unchanged), dbt resolves the ref('B') to the production relation instead of the development one. This means your CI or dev runs don’t need a full copy of the warehouse — they can borrow production tables for anything they’re not rebuilding.

The combination is transformative for developer workflows:

• Developers run only the models they changed.
• Unchanged upstream models resolve to production.
• No need to seed or pre-build the entire project in a dev schema.
• Full isolation — changes don’t interfere with each other.

The Gotchas Nobody Mentions

Source freshness changes don’t trigger model runs. `state:modified` on sources reflects freshness configuration changes, not the actual data changing. If you change a source’s freshness window from 1 hour to 2 hours, that’s a config change, and the source will be flagged as modified. But downstream models won’t automatically rebuild just because the source data has changed. dbt assumes downstream models will be rebuilt on schedule or on demand.

New columns on upstream models won’t flag downstream models. If an upstream model adds a column but its SQL otherwise produces the same results, the downstream model won’t be flagged as modified — even if your downstream model does SELECT *. For this reason, avoid `SELECT *` in critical production models. Be explicit about column selection.

The manifest must match the target environment. The saved manifest should come from a run against the same target (e.g., production). Using a manifest from a different environment (e.g., a dev manifest) can produce incorrect change detection.

First run has no baseline. On first use, there’s no prior manifest. Either run everything once to establish the baseline, or use dbt Cloud’s built-in state management, which handles this automatically.

Manifest compatibility across dbt versions. If you upgrade dbt Core between runs, the manifest schema might change, and the comparison might fail. Always keep your CI environment and production environment on the same dbt version (or be very careful when upgrading).

When It Breaks (And How to Fix It)

Scenario: “state:modified found no changes, but I know I changed something.”

dbt is comparing content hashes, not file modification times. If your change didn’t alter the compiled SQL or configuration, dbt won’t see it as modified. This is rare but can happen if you:

• Changed a comment in a Jinja block (comments get compiled out)
• Changed a variable used only in a non-dbt file
• Updated a macro without using it in a model

Solution: Explicitly select the model with `–select model_name` to force a rebuild.

Scenario: “My dbt Cloud runs are state-aware, but my local development isn’t.”

dbt Cloud automatically manages state. Local development requires you to download a manifest and point to it. If you’re toggling between the two, you might accidentally run full rebuilds locally. Solution: Set up manifest downloads locally too (or use the dbt Cloud CLI).

dbt Cloud vs. dbt Core State Management

dbt Cloud: Automatically persists manifests from prior runs and exposes a `–defer-to-state` toggle in the UI. Zero setup.

dbt Core: Requires you to manually persist the manifest (S3, GCS, etc.) and download it in CI. More work, but straightforward with any object store. Takes about 20 minutes to wire up.

For teams on dbt Core, the manifest management is DIY but simple. For dbt Cloud users, it’s automatic — one less thing to maintain.

The Real Cost Math

Assume a 400-model Snowflake project, hourly CI runs:

Without state: 400 models × 24 runs/day = 9,600 models built/day = $4,800/month

With state: Average 60% skip rate = 3,840 models built/day = $1,920/month

Monthly savings: $2,880 | Annual: $34,560

And this doesn’t count the developer time saved from faster CI feedback loops. A team running 10 PRs a day, each waiting 45 minutes for CI instead of 6 minutes, saves 390 person-minutes per day. Over a year, that’s 1,560 hours of developer time.

The One Principle

Compare, don’t guess. The manifest is the source of truth for what changed. dbt State removes the need for manual orchestration, custom scripts, or human judgment about what to rebuild. Compare the current project against a prior snapshot, run only what’s different, trust the math. That’s the entire philosophy.

Related reading: State Selection (dbt Docs) · Graph Operators (the `+` operator) · dbt Fusion: 30x Faster Parsing · Snowflake Query Execution: What Really Happens