Query Snowflake in DuckDB: Cut Dev Costs 80%

Stop burning Snowflake credits on dev queries. Query Snowflake data locally with DuckDB to slash development costs without changing your stack.

TL;DR

Snowflake’s 60-second minimum billing means a 4-second query gets charged for a full minute — you’re paying for 55 seconds of nothing
You can query Snowflake data in DuckDB via two routes: Iceberg tables on S3 (no warehouse needed) or ADBC using Apache Arrow (up to 38x faster than ODBC)
Once data is local in DuckDB, every subsequent query is free — no cloud credits consumed
A hybrid triage approach (short queries → DuckDB/MotherDuck, heavy ETL → Snowflake) cuts BI compute costs by 70–90% in practice
Dev and CI/CD workloads moved to local DuckDB eliminate an entire category of cloud spend entirely

I’ve been building on Snowflake long enough to know the ritual. Warehouse wakes up. Query runs in three seconds. Warehouse idles. You get billed for sixty seconds anyway. Multiply that by every analyst, every BI dashboard refresh, every dbt run in your dev environment — and suddenly you’re staring at a bill that feels completely disconnected from the actual work that happened.

For a long time I assumed this was just the price of doing business on a best-in-class cloud warehouse. What I didn’t realise — until I started taking DuckDB seriously — is that a meaningful chunk of that bill doesn’t have to exist at all.

This article covers three concrete methods to get Snowflake data into DuckDB, the cost math behind why you’d want to, and how to decide what actually belongs on which engine.

THE REAL PROBLEM: YOU’RE PAYING FOR COMPUTE YOU DIDN’T USE

Snowflake bills compute per second — but only after a 60-second minimum each time a warehouse resumes from suspension. A query that takes five seconds gets billed for a full minute. You paid for 55 seconds of nothing.

It gets worse at scale. When a BI dashboard fires 20 queries on load, each taking three seconds, that single page view triggers 1,200 seconds of billed compute time. The actual work? One minute.

And then warehouse sizing compounds it further. Each size increase in Snowflake doubles credit consumption. Teams defaulting to Medium or Large for everything are paying a 4x to 8x cost premium for workloads that could run perfectly well on X-Small.

I’ve seen this exact pattern on almost every Snowflake environment I’ve worked in. Oversized warehouse, auto-suspend set to ten minutes, no resource monitors, nobody looking at query history.

QUICK WINS INSIDE SNOWFLAKE FIRST

Before touching the architecture, fix the obvious things. These alone can cut spend by 20–40%.

Set AUTO_SUSPEND to exactly 60 seconds. Not lower — setting it below 60 is counterproductive because a query arriving in that first minute triggers another 60-second minimum. Not higher — every idle second past 60 is wasted money.

Default to X-Small warehouses. Only scale up when a specific workload has a documented SLA that requires it.

Add resource monitors:

CREATE OR REPLACE RESOURCE MONITOR monthly_etl_monitor
WITH CREDIT_QUOTA = 5000
TRIGGERS ON 75 PERCENT DO NOTIFY
        ON 100 PERCENT DO SUSPEND;

ALTER WAREHOUSE etl_heavy_wh 
SET RESOURCE_MONITOR = monthly_etl_monitor;

METHOD 1 — QUERYING SNOWFLAKE ICEBERG TABLES DIRECTLY IN DUCKDB

If your organisation has moved to Iceberg tables with underlying data stored in S3, you can read those tables directly in DuckDB — no Snowflake warehouse running, no credits consumed.

Install the extensions:

INSTALL httpfs;
LOAD httpfs;
INSTALL iceberg;
LOAD iceberg;

Configure AWS credentials:

CREATE SECRET (
    TYPE S3,
    PROVIDER CREDENTIAL_CHAIN
);

Find the current metadata file for your Snowflake-managed Iceberg table:

SELECT PARSE_JSON(
  SYSTEM$GET_ICEBERG_TABLE_INFORMATION('YOUR_DB.YOUR_SCHEMA.YOUR_TABLE')
)['metadataLocation']::varchar;

Query it in DuckDB:

SELECT
    customer_id,
    COUNT(*)
FROM iceberg_scan('s3://your-bucket/path/to/metadata/00001-xxxx.metadata.json')
GROUP BY 1;

Materialise once for fast repeated queries:

CREATE TABLE payments AS 
SELECT * FROM iceberg_scan('s3://your-bucket/.../metadata.json');

After this: same aggregation runs in 1.5s instead of 54s.

Real benchmark: a SELECT * on a 110-million row table finished in 29 seconds in DuckDB on an M1 MacBook. Same query on an X-Small Snowflake warehouse took 72 seconds.

The honest limitation: DuckDB’s Iceberg support is still maturing. You need direct S3 access and have to point DuckDB at a specific metadata file rather than a catalog. This will improve over time, but it works today.

METHOD 2 — QUERYING NATIVE SNOWFLAKE TABLES VIA ADBC

Not on Iceberg yet? ADBC (Arrow Database Connectivity) is the right tool here.

Apache Arrow is a columnar memory format. When you connect Snowflake to DuckDB via ADBC, data stays columnar the entire way. Traditional ODBC forces Snowflake to convert columnar → row for transfer, then DuckDB converts row → columnar for processing. DuckDB’s benchmarks show ADBC is up to 38x faster than ODBC.

Install:

pip install adbc_driver_snowflake pyarrow duckdb cryptography

Connect to Snowflake and pull data as an Arrow table:

import adbc_driver_snowflake.dbapi
import duckdb
import os
from read_private_key import read_private_key

SNOWFLAKE_CONFIG = {
    'adbc.snowflake.sql.account': os.getenv('SNOWFLAKE_ACCOUNT'),
    'adbc.snowflake.sql.warehouse': os.getenv('SNOWFLAKE_WAREHOUSE'),
    'adbc.snowflake.sql.role': os.getenv('SNOWFLAKE_ROLE'),
    'adbc.snowflake.sql.database': os.getenv('SNOWFLAKE_DATABASE'),
    'username': os.getenv('SNOWFLAKE_USER'),
    'adbc.snowflake.sql.client_option.jwt_private_key_pkcs8_value': pem_key,
    'adbc.snowflake.sql.auth_type': 'auth_jwt'
}

snowflake_conn = adbc_driver_snowflake.dbapi.connect(
    db_kwargs={**SNOWFLAKE_CONFIG}
)

snowflake_cursor = snowflake_conn.cursor()
snowflake_cursor.execute("SELECT * FROM SANDBOX_DB.MY_SCHEMA.RAW_ORDERS")

# Fetch as Arrow table — stays columnar, no serialisation overhead
arrow_table = snowflake_cursor.fetch_arrow_table()

# Persist locally in DuckDB
duckdb_conn = duckdb.connect('demo.db')
duckdb_conn.execute("""
    CREATE TABLE IF NOT EXISTS raw_orders AS 
    SELECT * FROM arrow_table
""")

One heads-up: figuring out the connection parameters using a private key is not straightforward — the docs aren’t great on this point. The private key needs to be re-encoded into PEM format before passing it to the ADBC driver:

from cryptography.hazmat.primitives import serialization

def read_private_key(private_key_path: str, private_key_passphrase: str = None) -&gt; str:
    with open(private_key_path, 'rb') as key_file:
        private_key = serialization.load_pem_private_key(
            key_file.read(),
            password=private_key_passphrase.encode() if private_key_passphrase else None
        )
        pem_key = private_key.private_bytes(
            encoding=serialization.Encoding.PEM,
            format=serialization.PrivateFormat.PKCS8,
            encryption_algorithm=serialization.NoEncryption()
        )
        return pem_key.decode('utf-8')

Once that’s sorted, the workflow is clean: pull data from Snowflake via ADBC once, materialise it locally in DuckDB, query it as many times as you want — zero Snowflake credits consumed after the initial pull.

METHOD 3 — THE HYBRID ARCHITECTURE: ROUTE WORKLOADS BY TYPE

The two methods above are great for development and ad-hoc analysis. For production BI workloads, the cleanest solution I’ve seen is a hybrid architecture where you triage queries by workload type.

The insight that unlocked this for me was using Snowflake’s query_history to actually categorise what’s running:

WITH query_stats AS (
    SELECT
        warehouse_name,
        user_name,
        query_id,
        execution_time / 1000 AS execution_seconds
    FROM snowflake.account_usage.query_history
    WHERE
        start_time &gt;= DATEADD('day', -30, CURRENT_TIMESTAMP())
        AND warehouse_name IS NOT NULL
        AND execution_status = 'SUCCESS'
)
SELECT
    warehouse_name,
    user_name,
    COUNT(query_id) AS query_count,
    MEDIAN(execution_seconds) AS median_execution_seconds,
    CASE
        WHEN query_count &gt; 1000 AND median_execution_seconds &lt; 30 
            THEN 'Interactive BI / High Frequency'
        WHEN query_count &lt;= 1000 AND median_execution_seconds &lt; 60 
            THEN 'Ad-Hoc Exploration'
        WHEN median_execution_seconds &gt;= 300 
            THEN 'Batch ETL / Heavy Analytics'
        ELSE 'General Purpose'
    END AS workload_category
FROM query_stats
GROUP BY warehouse_name, user_name
ORDER BY query_count DESC;

Use MEDIAN not AVG — outlier queries skew the average and give a misleading picture of typical duration.

Routing logic:

Short and bursty BI (sub-30s, high frequency) → move to usage-based engine. Real math: $528/month on Snowflake X-Small running continuously vs $5.87/month on per-second billing for the same workload.
Dev and CI/CD → local DuckDB, zero cloud credits
Heavy batch ETL, multi-TB → keep on Snowflake, 60s minimum is irrelevant for hour-long jobs

This is the same principle I apply when thinking about orchestration — use the right tool for the job, not the most powerful tool for everything. I wrote about a similar decision process in why I stopped using Snowflake Tasks for orchestration — the short version is that mature orchestration tools give you far more control over exactly this kind of workload routing.

WHEN TO STAY ON SNOWFLAKE

Multi-terabyte batch processing — predictable provisioned compute matters more than idle cost savings when a job runs for hours.

Enterprise governance — complex data masking, RBAC at scale, data residency requirements. Snowflake’s security surface is mature. DuckDB isn’t designed for this.

Already-efficient workloads — if a warehouse runs at high utilisation for 8 hours straight, there’s no idle tax to eliminate. Don’t fix what isn’t broken.

WHAT REAL COST SAVINGS LOOK LIKE

One SaaS company: 70%+ reduction in warehousing costs after moving to DuckDB-based solution
Okta: $60,000/month Snowflake spend for threat detection reduced substantially using parallel DuckDB instances
A data engineering team: 79% immediate reduction in Snowflake BI spend using DuckDB as a caching layer, 7x faster query times

None of these required abandoning Snowflake. They required deciding which workloads actually needed it.

FREQUENTLY ASKED QUESTIONS

Can you query Snowflake data in DuckDB without a Snowflake warehouse running?
Yes — two ways. Iceberg tables via the iceberg extension (no warehouse), or native tables via ADBC. Both require a brief initial connection, but once data is materialised locally, all subsequent queries are free.

What is ADBC and why is it faster than ODBC?
ADBC keeps data in columnar format throughout. ODBC forces columnar → row → columnar conversion. DuckDB benchmarks show ADBC up to 38x faster for transfers.

How much can I realistically save?
For short, high-frequency dashboard queries: 70–90% is consistent across documented cases. The 60-second minimum means a 4-second query costs 15x what it should.

Is DuckDB production-ready?
For single-node analytical workloads under a few terabytes: yes. Multi-user concurrency at scale and enterprise governance: not yet.

Do I need Iceberg?
No. ADBC works with native Snowflake tables. The main friction is private key encoding, which the docs don’t explain well.

Will this work with dbt?
Yes. dbt-duckdb lets you run your full dbt project locally against DuckDB. Pull source data once from Snowflake, develop and test for free, deploy to Snowflake in production only.This eliminates cloud compute costs for the entire development loop. I’ve written about dbt native projects and pipeline patterns if you want more context on how this fits into a Snowflake-first stack.

Query Snowflake in DuckDB: Cut Dev Costs 80%

More Articles