Why dbt Tests Pass But Bad Data Still Ships

dbt tests green? Data still broken. Why schema tests miss real issues and how to catch business-logic failures before they hit production dashboards.

I’ve been running dbt in production for a while now. And I’ll be honest — there was a phase where I genuinely believed that if my dbt tests were green, I was good. Green means clean, right?

Wrong.

This is the quiet failure mode that nobody in the dbt community writes about loudly enough. Your tests pass. Your CI/CD pipeline goes green. Your DAG runs without errors. And somewhere downstream, an analyst is staring at a revenue number that’s off by 30% and has no idea why.

TL;DR: dbt’s built-in tests (not_null, unique, accepted_values, relationships) validate data structure, not data correctness. Your pipeline goes green and you still ship wrong numbers. This post breaks down exactly why that happens, what the real gaps are, and what custom tests, volume monitoring, and source-layer checks actually fix

Let me walk you through exactly how this happens — because I’ve lived it.

What Are dbt Tests Actually Checking?

Before we get to the failure modes, let’s be precise about what dbt’s generic tests actually do — because I think the confusion starts here.

dbt gives you four built-in generic tests out of the box:

not_null — checks that a column has no null values
unique — checks that all values in a column are distinct
accepted_values — checks that a column only contains values from a predefined list
relationships — checks referential integrity between two models

These are constraint tests. They validate the shape of your data — grain, nullability, referential integrity. They do not validate whether the values are correct, whether the volume is expected, or whether the business logic in your SQL is actually right.

That distinction is everything.

models:
  - name: fct_daily_revenue
    columns:
      - name: transaction_id
        tests:
          - not_null
          - unique
      - name: revenue_amount
        tests:
          - not_null

This test suite passes even if every revenue_amount is 100x too large. It passes if your join silently drops 40% of records because a key format changed upstream. It passes if a currency unit changed after a vendor migration and nobody touched the schema.

None of that is a bug in dbt. It’s working exactly as designed. The problem is the mental model we build around it.

The Scenario That Broke Me

We had a pipeline pulling sales transaction data from an API. The dbt model joined it against a product dimension, aggregated daily revenue, and pushed it to a reporting layer. All four generic tests — passing. Every single day.

What was actually happening: the upstream API started returning amounts in a different currency unit after a vendor migration. No schema change. No new nulls. No duplicate keys. Just the values silently shifting by a factor of 100.

Our not_null test on revenue_amount? Passed. Our unique test on transaction_id? Passed. Our downstream revenue dashboard was off by two orders of magnitude for three weeks before an analyst caught it during a QBR.

Three weeks. All green tests. All wrong data.

That’s when I stopped treating dbt tests as a data quality guarantee and started treating them as what they actually are: a contract enforcement layer.

The Three Gaps Nobody Talks About

1. Volume Drift — Records Disappear and Nothing Breaks

If your fct_orders model typically produces 50,000 rows a day and one morning it produces 12,000 — no generic test will catch that. The data that is there is perfectly valid. You just lost 38,000 records somewhere in your pipeline and dbt has no idea.

This is one of the most common real-world pipeline failures I see, and it’s completely invisible to constraint-based tests.

The fix is a custom singular test or a dbt_utils recency/row-count assertion:

-- tests/assert_row_count_within_threshold.sql
{% set threshold = 0.2 %}
select 1
from (
  select count(*) as today_count
  from {{ ref('fct_orders') }}
  where order_date = current_date
) today
cross join (
  select avg(daily_count) as avg_count
  from (
    select order_date, count(*) as daily_count
    from {{ ref('fct_orders') }}
    where order_date between current_date - 14 and current_date - 1
    group by order_date
  ) history
) baseline
where abs(today_count - avg_count) / nullif(avg_count, 0) > {{ threshold }}

This returns a row — which dbt interprets as a test failure — when today’s row count deviates more than 20% from the 14-day average. Simple, practical, catches real failures. I also wrote about a similar pattern in how dbt integrates natively with Apache Airflow for pipeline orchestration — the combination of orchestration visibility and volume tests gives you a much more honest picture of pipeline health than either alone.

2. Business Logic Correctness — The Math Can Still Be Wrong

dbt tests validate columns in isolation. They don’t validate relationships between columns, or whether the calculations in your model are actually right.

Take something simple:

select
  order_id,
  unit_price,
  quantity,
  unit_price * quantity as line_total
from {{ source('orders', 'order_lines') }}

You can have not_null on all three columns, accepted_values on quantity to ensure it’s positive — and still ship models where line_total is wrong because unit_price was populated in cents from one source and dollars from another. No generic test catches that unless you explicitly write:

-- tests/assert_line_total_matches_components.sql
select *
from {{ ref('fct_order_lines') }}
where abs(line_total - (unit_price * quantity)) > 0.01

Writing that test requires you to already know the business rule. Which means data quality at this layer requires domain knowledge, not just dbt knowledge. If you’re using Snowflake, pairing this with Cortex-based automated data quality checks can flag anomalies in derived metrics that pure SQL assertion tests would miss — something I covered in depth when building Snowflake Cortex accelerators for automated data quality.

3. Silent Join Fan-Out and Record Loss

This one has bitten me more than once. A many-to-one join accidentally becomes many-to-many because a dimension table you assumed was unique… wasn’t. Or a left join silently drops records because a key format changed from integer to string somewhere upstream.

The result: your fact table either fans out (double-counting revenue) or silently loses records, and every generic test still passes because the columns that remain are perfectly valid.

The safeguard is writing uniqueness tests on your dimension tables and asserting that your fact-to-dimension join doesn’t increase row count:

-- tests/assert_no_join_fanout.sql
with before_join as (
  select count(*) as row_count from {{ ref('fct_orders') }}
),
after_join as (
  select count(*) as row_count
  from {{ ref('fct_orders') }} o
  left join {{ ref('dim_customers') }} c on o.customer_id = c.customer_id
)
select 1
from before_join b
cross join after_join a
where a.row_count > b.row_count

What Actually Helps

Write custom singular tests for critical models. Don’t rely only on generic column-level tests for anything that feeds a financial or executive dashboard. If the number matters, test the business rule explicitly.

Add volume and freshness monitoring at source. Whether you use dbt_utils.recency, Elementary, or a hand-rolled SQL assertion — track volume. It’s the cheapest signal you have that something went wrong upstream.

sources:
  - name: raw_transactions
    tables:
      - name: transactions
        tests:
          - dbt_utils.recency:
              datepart: hour
              field: created_at
              interval: 3
        columns:
          - name: amount
            tests:
              - dbt_utils.accepted_range:
                  min_value: 0
                  max_value: 1000000

Test at source, not just at the model layer. If an upstream format changes, you want the failure at ingestion, not after three transformation layers have already propagated it downstream.

Use dbt_utils and Elementary seriously. The dbt_utils package has range tests, expression tests, and recency checks that fill a lot of the structural gaps. Elementary adds anomaly detection on top of that, which gets you closer to actual data observability rather than just constraint validation.

Review your SQL, not just your CI badge. Every model that feeds a critical metric should have a comment explaining the expected grain, the join logic, and the expected value ranges. Future you — and the next engineer — will thank you when something breaks at 2am.

The Mindset Shift

I had to reframe how I think about dbt tests. They’re not a data quality guarantee. They’re a contract enforcement layer. They ensure your data meets its structural promises. That’s genuinely useful — but it’s not the same as ensuring your data is correct.

Real data quality requires a combination of:

Structural tests — what dbt gives you natively (constraint validation)
Business logic tests — custom singular tests you write based on domain knowledge
Volume and freshness monitoring — dbt_utils, Elementary, or your own row count assertions
Code review culture — someone actually looks at the SQL, not just whether CI passed

The green checkmark in your pipeline is not permission to stop thinking. It’s permission to look at the next layer of potential failure.

I spent a long time treating dbt tests as a safety net. They’re more like a fence — useful, visible, and completely ineffective against threats that don’t come through the gate.

Frequently Asked Questions

Do dbt tests guarantee data quality?

No. dbt’s built-in generic tests — not_null, unique, accepted_values, relationships — validate structural constraints on your data. They confirm that a column has no nulls, that keys are unique, or that values fall within an expected set. They do not verify whether the actual values are correct, whether business logic in your SQL is right, or whether record volumes are within expected ranges. For genuine data quality coverage, you need custom singular tests, volume monitoring, and source-layer assertions alongside dbt’s native tests.

What is the difference between dbt generic tests and singular tests?

Generic tests in dbt are reusable, schema-defined checks applied to columns across multiple models — not_null and unique are the most common. Singular tests are standalone SQL queries that you write specifically for a model or business rule: they return rows when something is wrong and pass when they return no rows. Singular tests are where you validate business logic — things like “line_total should always equal unit_price × quantity” or “today’s row count should be within 20% of the 14-day average.” Both types live in your tests/ directory and run with dbt test.

Can dbt catch silent record loss in joins?

Not automatically. If a join accidentally drops records — due to a key format change, a null key, or a mismatched data type — dbt’s generic tests won’t flag it unless you’ve explicitly written a test to assert row count consistency before and after the join. This is one of the most common silent failure modes in production dbt pipelines. Writing a custom singular test that compares pre- and post-join row counts is the most reliable way to catch it.

How do I monitor row count changes in dbt?

There are a few approaches. The dbt_utils package includes a recency test for freshness monitoring. For volume, you can write a custom singular test that compares today’s row count against a rolling average from the past 14 days — any deviation beyond a threshold (say 20%) triggers a failure. For more automated anomaly detection across all your models, Elementary integrates directly with dbt and adds statistical monitoring without requiring you to write individual volume tests for every model.

What is the best way to test business logic in dbt?

Write singular tests that encode the business rule explicitly in SQL. For example, if your model calculates revenue = quantity × unit_price, write a test that queries the model and returns rows where abs(revenue - (quantity * unit_price)) > 0.01. If there are cross-column invariants — like a refund amount should never exceed the original transaction amount — write that as a test too. The key insight is that these tests require domain knowledge: you need to know what correct looks like before you can assert it. That’s a conversation between data engineers and the business teams who own the metrics.

Does dbt have built-in anomaly detection?

dbt itself does not include statistical anomaly detection. The core framework focuses on constraint-based testing. For anomaly detection — flagging unexpected spikes, drops, or distribution shifts in your data — you need either the Elementary package, which sits on top of dbt and adds automated monitoring, or a dedicated data observability platform like Monte Carlo, Soda, or Bigeye. In Snowflake environments specifically, combining dbt with Cortex-based quality checks can add an AI-assisted layer on top of your existing test suite.

The Honest Closing

The reason nobody talks loudly about this is that it’s uncomfortable. We build testing frameworks because they give us confidence. Admitting that green tests can coexist with broken data means admitting that the confidence was partly false.

But I’d rather have that honest conversation in a blog post than explain to a VP why the quarterly revenue numbers were wrong — and then pull up a CI pipeline that was green the whole time.

Write the custom tests. Monitor the volumes. Test the business rules. Trust the process, not just the color of the badge.

Related reading from the blog: