Ask ten data engineers what happens when you run a query in Snowflake and most of them will tell you the same thing: the warehouse runs it. SQL goes in,…

Ask ten data engineers what happens when you run a query in Snowflake and most of them will tell you the same thing: the warehouse runs it. SQL goes in, the warehouse chews on it, results come out. Clean, simple, and just wrong enough to cost you money. The truth is that Snowflake query execution is a trip through three separate layers, and the one you actually pay for is the last to get involved — if it gets involved at all.

Flowchart of the life of a Snowflake query: query submitted, cloud services layer parses and prunes, result cache check returns a cached result on a hit with zero credits, and on a miss the virtual warehouse runs it across MPP nodes before fetching micro-partitions from storage and returning results.

The path every query takes — and the shortcut a cache hit gets to skip compute entirely.

TL;DR

→ Running a query isn’t one step. It’s a trip through three layers that scale independently, and the compute you pay for is the last one to wake up.

→ The cloud services layer does the thinking first — parses, plans, and prunes — using per-partition metadata to throw out data that can’t match your filter before a single byte is read.

→ If the exact same query ran in the last 24 hours and the data hasn’t moved, you get a cached result for free. No warehouse. No credits. Milliseconds.

→ Only on a cache miss does a virtual warehouse actually spin up and crunch the query across its nodes in parallel.

→ Warehouses keep recently-read micro-partitions on local SSD, but that cache vanishes the second the warehouse suspends.

→ “Identical” is brutally literal. A stray table alias, lowercase keywords, or a RANDOM() call quietly knocks you off the cache and back onto billed compute.

→ When a query is slow, it’s usually scanning too much, not computing too much. A bigger warehouse fixes the second problem and never the first.

The mental model that’s quietly costing you money

Here’s the reframe that changes how you think about every query: the virtual warehouse is the last thing to get involved, and plenty of queries never touch it at all. Almost everything that decides whether your query is fast or slow, cheap or expensive, happens in a layer you probably weren’t even picturing.

Get this wrong and you end up doing what I’ve watched teams do a dozen times: a dashboard feels sluggish, someone bumps the warehouse from Medium to Large, it gets a little faster, everyone moves on. The bill goes up. The query was never compute-bound in the first place. They paid more to run the same broken query faster.

So let’s actually follow a query, from the moment you hit run to the moment rows come back.

The three layers a query passes through

Snowflake splits into three layers, and the thing to internalize is that they scale completely independently. That’s not a trivia fact — it’s the reason the whole platform behaves the way it does. Every query touches all three. Some only touch the first.

Diagram of Snowflake’s three layers: the cloud services layer handling parse, optimize, prune and result cache at the top; the compute layer with virtual warehouse MPP nodes in the middle; and the storage layer holding micro-partitions in S3, Azure Blob or GCS at the bottom, with a query entering the top and a cached result returning from it

The cloud services layer is the brain. The warehouse is just the muscle.

At the top sits the cloud services layer — the brain. Logins, sessions, parsing, query planning, transaction coordination, all the metadata: it lives here. Underneath that is the compute layer, made up of virtual warehouses. These are the MPP clusters that do the actual SQL grunt work. And at the bottom is storage: your data sitting in cloud object storage (S3, Azure Blob, or GCS) as immutable, compressed, columnar micro-partitions.

The magic is that none of these share resources with each other. You can resize compute without moving a single byte of storage. You can point ten warehouses at the same table and none of them slow the others down. Keep that in your back pocket, because it’s exactly why the “just make the warehouse bigger” instinct fails so often.

Step 1: The cloud services layer does the thinking

Your query lands, and before any table data gets read, the cloud services layer goes to work.

It parses the SQL, and the cost-based optimizer builds a plan. Then it does the single most important thing for performance, and it does it for free: partition pruning. Every micro-partition carries metadata, including the min and max value for each column. So when you write WHERE order_date = '2026-06-01', the optimizer reads that metadata first and skips every partition whose range can’t possibly hold a matching row. Ten thousand partitions in the table, three that actually get scanned. No index. No tuning. You didn’t do anything.

This is also why a query can come back in milliseconds with no warehouse at all. Run a SELECT COUNT(*) or a MAX() and Snowflake often answers straight from the metadata cache — you’ll see a lone METADATA-BASED RESULT step in the query profile and zero compute on the bill.

How good the pruning is comes down to how well your data is naturally ordered, which is the whole point of clustering and Snowflake’s automatic optimization features.

Step 2: The result cache check, before any compute

Now, still before any warehouse gets involved, cloud services checks the result cache. Identical query in the last 24 hours, data hasn’t changed since? Snowflake just hands back the stored result from the cloud services layer. Nothing resumes. Nothing gets billed. The answer is basically instant.

This is the cheapest query you’ll ever run, and it’s the reason refreshing a dashboard a second time costs nothing. Here’s the part that surprises people: every time that cached result gets reused, the 24-hour clock resets. Keep hitting it and a result can stay alive for up to 31 days. A popular dashboard query can effectively sit in cache for a month and never cost a credit.

Which brings us to the catch. “Identical” is doing a lot of work in that sentence, and it’s the thing nobody reads about until they’re staring at a bill wondering where the money went.

Step 3: The virtual warehouse finally runs it

Cache miss. Now — and only now — the query goes to a virtual warehouse. If it was asleep, it resumes (a second or two, usually). The warehouse is a little cluster of compute nodes, and this is where massively parallel processing earns its name: the surviving micro-partitions get split across the nodes and cores, and the filters, joins, and aggregations all happen in parallel.

This is the layer with a meter running. You’re billed by the second, with a 60-second minimum every time a warehouse starts or resumes. That minimum trips people up constantly — a warehouse that keeps flickering on and off for tiny one-off queries can quietly cost more than one you just leave warm, because every single resume restarts that 60-second clock. It’s also the only layer where resizing actually helps, and that matters less often than you’d think.

Step 4: Storage, and the cache that disappears

The warehouse goes and fetches the micro-partitions that survived pruning. Each micro-partition holds 50–500 MB of uncompressed data in a columnar format, so only the columns you actually asked for get read. Ask for two columns out of fifty and you pay to read two.

And there’s a second cache down here. As the warehouse pulls partitions from remote storage, it stashes them on local SSD. Next query on that same warehouse that needs the same data? It reads from SSD instead of making the round trip to object storage, which is a lot faster. This is what people mean when they say a warehouse “warms up.”

The catch — and there’s always a catch — is that this cache gets wiped the instant the warehouse suspends. That’s the real trade-off hiding inside your AUTO_SUSPEND setting. Suspend fast and you stop paying for idle time but you toss the warm cache. Leave it running and you keep the cache but pay for the idle. A lot of teams land on 60 seconds, but honestly the right answer depends entirely on how often your queries actually fire.

Three caches, one comparison to bookmark

There are three caches, they live in different layers, and mixing them up is behind half the “wait, why was that slow?” conversations I’ve ever had.

Cache	Layer	What it stores	Survives suspend?	Compute cost
Result cache	Cloud services	Final query result sets	Yes (24h, resets on reuse, 31-day max)	None
Metadata cache	Cloud services	Row counts, min/max, distinct counts	Yes	None
Local disk (warehouse) cache	Compute	Micro-partitions read from storage	No — purged on suspend	Warehouse already running

The two cloud-services caches are shared across the whole account — every warehouse, every user benefits. The local disk cache belongs to one warehouse and dies with it. If you want the gory details with real query-profile screenshots, Snowflake’s own community piece on caching is the place to go.

The cache rules nobody warns you about

The result cache only kicks in when Snowflake decides the new query is identical to the old one, and “identical” is far more literal than anyone expects. Snowflake’s own docs spell it out: add a table alias, or just retype your keywords in lowercase, and you miss the cache. The query goes to billed compute and you never get a warning.

It also bails on the cache if the query contains a non-deterministic function — UUID_STRING, RANDOM, RANDSTR are the usual suspects — or an external function, or if it reads from a hybrid table. And the role running it needs privileges on every table involved.

⚠️ What this means in practice: if you want your dashboards and scheduled jobs riding the cache, lock down the exact text your BI tool emits. Random aliasing, dynamic comments, an injected session variable, a stray current_timestamp() — any of it silently drops you off the cache, and you start paying to recompute answers you already had. The Query Cost Estimator is a quick way to put a number on what those misses are costing you.

Why a bigger warehouse usually isn’t the answer

This is the most expensive misunderstanding in all of Snowflake, so it’s worth being blunt about it. A query is slow for one of two reasons: it’s scanning too much data, or it’s doing too much actual computation. Resizing the warehouse only helps the second one.

When the real problem is bad pruning — Snowflake had to scan most of the table because the data wasn’t laid out in a way it could skip — a bigger warehouse just reads the same mountain of data faster and charges you more per second for the privilege. The fix isn’t more nodes. It’s better pruning: clustering, a tighter filter, a smarter query shape. The query profile tells you which world you’re in. Look at partitions scanned versus partitions total. If you’re chewing through 95% of them, no warehouse size on earth is going to save you.

And sometimes the honest answer to a giant, I/O-heavy scan isn’t a bigger Snowflake warehouse at all — it’s moving that work somewhere cheaper, which is the whole idea behind running the query in DuckDB instead. If you do decide a resize is warranted, the warehouse sizing guide will at least keep you from overshooting.

What I actually check when a query misbehaves

When something’s slow or weirdly expensive, here’s where I start. All of it comes out of the query profile and INFORMATION_SCHEMA — no guessing, no vibes.

-- Did your last query hit the result cache?
-- A cache hit shows bytes_scanned = 0 and a near-zero execution time.
SELECT query_id,
       query_text,
       bytes_scanned,
       percentage_scanned_from_cache,
       execution_time / 1000 AS exec_seconds
FROM TABLE(information_schema.query_history())
ORDER BY start_time DESC
LIMIT 10;

-- Turn the result cache off for honest benchmarking (this session only)
ALTER SESSION SET USE_CACHED_RESULT = FALSE;
-- ...run your test queries...
ALTER SESSION UNSET USE_CACHED_RESULT;

-- Hunt down your worst pruning offenders
SELECT query_id,
       partitions_scanned,
       partitions_total,
       ROUND(100 * partitions_scanned / NULLIF(partitions_total, 0), 1) AS pct_scanned,
       bytes_scanned / 1e9 AS gb_scanned
FROM snowflake.account_usage.query_history
WHERE partitions_total > 0
ORDER BY bytes_scanned DESC
LIMIT 20;

-- Balance warm cache against idle cost
ALTER WAREHOUSE analytics_wh SET AUTO_SUSPEND = 60;  -- seconds

The number to stare at is pct_scanned in that third query. High on your slow queries? You’ve got a pruning problem, and that’s where your time should go — not the warehouse dropdown.

What this actually costs you

Architecture is interesting, but the bill is what makes it matter. So let’s put real numbers on it. Credit prices swing with your edition, cloud, and region, so treat these as illustrative — drop your own rate into the cost calculator for the exact figure.

Picture one ordinary dashboard query. It runs on a Medium warehouse (4 credits an hour, call it roughly $3 a credit), and it fires 100 times a day because that’s how often people open the dashboard. Here’s how the same query plays out depending on whether the layers are doing their job:

Scenario	What actually runs	~Cost / day	~Cost / month
No cache, bad pruning (30s/run, scans ~95% of partitions)	100 full compute runs	~$10	~$300
Result cache hits 90% of the time	10 compute runs, 90 free cached returns	~$1	~$30
Cache + good pruning (3s/run, scans ~5%)	10 runs at a tenth of the scan	~$0.10	~$3

Same query. Same data. Roughly $300 a month versus $3, and the only difference is whether you let the result cache and pruning do what they’re built to do. Now multiply that across a dashboard with thirty queries on it, and you can see how a Snowflake bill quietly triples without anyone writing a single new query.

The gotchas nobody warns you about

The 60-second minimum punishes spiky workloads. A query that finishes in two seconds still bills a full minute if the warehouse had to resume to run it. Set AUTO_SUSPEND too aggressively on a workload that fires a query every couple of minutes and you’ll pay more in cold-start minimums than you ever saved on idle time. Aggressive suspend is not automatically cheaper.

One write nukes the whole result cache. The cache is invalidated by any change to the underlying table, not just changes to the rows your query touched. A single late-arriving record at 2 a.m. quietly wipes the cached result for every dashboard query built on that table, and tomorrow morning they all run on compute again. If your “cached” dashboard mysteriously costs money some mornings, this is usually why.

Cloud services is free, right up until it isn’t. The parsing, planning, and pruning in the cloud services layer is free — but only while it stays under 10% of your daily warehouse compute. Hammer it with thousands of tiny metadata queries or relentless INFORMATION_SCHEMA polling and you cross that line, and Snowflake starts billing the overage. Most teams never hit it; the ones running huge volumes of trivial queries do, and they never see it coming.

The local cache doesn’t follow you between warehouses. That warm SSD cache belongs to one specific warehouse. Run your ETL on one warehouse and your reporting on another — sensible workload isolation — and the reporting warehouse gets exactly zero benefit from the partitions ETL just pulled. Each warehouse warms its own cache from cold.

SELECT * throws away the columnar advantage. Snowflake only reads the columns you ask for. Ask for all of them and you pay to read all of them, even when the dashboard displays three. In a wide table that’s the difference between scanning a few columns and dragging the entire row off storage.

The mistakes that quietly drain the budget

Almost every overspending Snowflake account I’ve looked at is making some combination of these five:

1. Leaving AUTO_SUSPEND at the 600-second default. Ten full minutes of paid idle after every burst of activity. On a warehouse that’s used in short bursts, that idle time can dwarf the actual query time. Most analytics warehouses are fine at 60 seconds.

2. One giant warehouse for everything. Pile ETL, ad-hoc analysis, and dashboards onto a single warehouse and they fight over the cache and the compute. Separate warehouses per workload keep each cache warm and each workload predictable.

3. Scaling up when the real fix is pruning. The reflex we opened with. If pct_scanned is high, a bigger warehouse just burns more credits reading the same data. Fix the layout, not the size.

4. Letting the BI tool emit sloppy SQL. Inconsistent aliases, injected timestamps, and per-user comments mean “the same” dashboard query is never byte-for-byte identical, so it never reuses the result cache. Standardize what the tool sends.

5. Over-clustering a hot table. Automatic clustering isn’t free — it spends credits reorganizing data in the background. On a table that’s written constantly, that background churn can cost more than the queries it speeds up. Cluster the tables you read far more than you write.

The one principle to take away

If you remember nothing else, remember this: scan less, don’t compute faster. Snowflake decides almost everything about a query’s speed and cost before the warehouse ever wakes up — in the pruning, in the caches, in how your data is laid out. The warehouse size is the last lever you should reach for, not the first. Get the layers above it working and most “we need a bigger warehouse” conversations simply stop happening.

Snowflake Query Execution: what really happens under the hood