How the Warehouse Cache Actually Works in Snowflake

A dashboard that ran in four seconds on Monday took nineteen seconds on Tuesday. Same query, same data, same warehouse size.

A dashboard that ran in four seconds on Monday took nineteen seconds on Tuesday. Same query, same data, same warehouse size. I spent the better part of an hour convinced Snowflake was having a bad day, checking the status page, refreshing query history, muttering about “platform issues” in our team Slack — before I noticed our DevOps script had quietly added an aggressive auto-suspend policy the night before. The warehouse cache was getting wiped every single morning, and I’d built our entire “fast dashboard” reputation on a cache that reset itself before anyone showed up to work.

TL;DR

Snowflake’s warehouse cache (local disk cache) stores raw compressed micro-partition data on each node’s SSD, not query results.
It’s separate from the result cache and metadata cache — three different caches, three different jobs.
Auto-suspending a warehouse wipes this cache completely. Resuming starts it cold every time.
You can verify it’s working with the ‘percentage scanned from cache’ field in query profile or ACCOUNT_USAGE.QUERY_HISTORY.
Multi-cluster warehouses don’t share this cache — a query routed to a new cluster starts cold even if a sibling cluster is warm.
You can’t manually size or pin it. The only real lever you control is the auto-suspend timer.

That mistake is what got me actually reading how this thing works instead of just trusting that Snowflake would “handle it.” Turns out the warehouse cache isn’t magic, isn’t tunable, and isn’t the same thing as the result cache most people learn about first. Here’s what it actually is, where it lives, and how to tell when it’s helping you versus quietly costing you money.

Diagram showing a cloud data warehouse architecture with three layers: Cloud services (result and metadata cache), compute layer with three SSD nodes, and a database storage layer with immutable micro-partitions. Arrows indicate data flow.

Three caches, one name people use for all of them

Snowflake has three distinct caching layers, and the confusion starts because people say “Snowflake caches my query” without specifying which one did the work. The result cache sits in the cloud services layer and stores entire finished query results — if the same exact SQL text runs again within 24 hours and the underlying data hasn’t changed, you get the answer back with bytes_scanned = 0 and zero compute cost. The metadata cache, also in cloud services, holds statistics about every micro-partition — min and max values per column, row counts — so Snowflake can decide which partitions to skip before it ever touches the data.

The warehouse cache is the third one, and it’s the one this article is actually about. It lives on the local SSD of every node in a running virtual warehouse, and it stores raw, compressed micro-partition data — not query results, not aggregated answers, the actual columnar bytes that got pulled from remote storage to answer a scan.

Why this distinction actually matters

If you only know the result cache exists, you’ll misdiagnose a lot of performance issues. Change one character in a WHERE clause, add a comment, swap the role running the query — any of those bypass the result cache entirely, because it requires an exact text match. The warehouse cache doesn’t care about query text at all. It cares about which micro-partitions a query needs and whether those bytes are already sitting on a node’s SSD from a previous scan.

What a micro-partition actually is

You can’t understand the warehouse cache without understanding the unit it stores. When data lands in a Snowflake table, it gets automatically carved into micro-partitions — contiguous, immutable blocks holding somewhere between 50MB and 500MB of uncompressed data each, stored in a columnar format. There’s no manual partitioning scheme to design, no index to build. Snowflake just does this on every load.

Each micro-partition carries its own metadata: minimum and maximum values for every column, which is exactly what the metadata cache is built from. When you filter a query on order_date > '2026-01-01', Snowflake checks that metadata first and skips any micro-partition whose max date falls before that threshold. That’s partition pruning, and it happens before a single byte gets pulled into the warehouse cache. Pruning decides what to read; the warehouse cache decides how fast a repeat read of the same partitions will be.

The actual lifecycle of the warehouse cache

Here’s the sequence that matters in practice. A warehouse resumes from suspended state with completely empty SSD — there’s nothing cached because the compute nodes assigned to it are freshly provisioned. The first query that touches a table has to pull every relevant micro-partition from remote cloud storage, which is the slowest tier in the whole architecture. As those partitions get read, they’re written to the local SSD cache as a side effect — not because you asked for caching, just because that’s what happens when a node reads remote data.

The second query — if it touches the same micro-partitions and the warehouse is still running — can read those bytes off local SSD instead of going back to remote storage. This is meaningfully faster, and it’s why a sequence of similar queries against the same table speeds up the longer a warehouse stays warm. There’s no explicit “build the cache” step. Cache population is a byproduct of usage, which is exactly why a single cold query tells you almost nothing about real-world performance.

Snowflake cache warm cold cycle x class=

Then the warehouse suspends, and it’s gone

This is the part that bit me. When a warehouse auto-suspends, the compute nodes it was using get released back into Snowflake’s shared pool. The SSD on those specific nodes goes with them. When the warehouse resumes — even if it’s seconds later, even if it’s the exact same warehouse name — there’s no guarantee you get the same physical nodes back, and the cache starts from zero regardless. There’s no persistence, no “save state before suspending.” It’s just gone.

Resizing a warehouse up or down does the same thing. A different size means a different set of nodes, which means different SSDs, which means the next queries run cold no matter how warm things were five minutes earlier.

How to actually see this working

Stop assuming and go look at it. Open the query profile for any query in Snowsight and check the IO statistics panel — there’s a field literally called percentage_scanned_from_cache. Run the same query twice in a row on a warm warehouse and watch that number jump from near 0% to something much higher on the second run. Suspend the warehouse, resume it, run the same query again, and watch it drop back to 0%. That’s the entire mechanism, visible in about ninety seconds of testing.

For a wider view across your account, query history gives you the same field at scale. This is the query I run when someone asks “is our caching even helping”:

Check your real cache hit rate (last 30 days, by warehouse)

SELECT
    warehouse_name,
    COUNT(*) AS query_count,
    SUM(bytes_scanned) AS bytes_scanned,
    SUM(bytes_scanned * percentage_scanned_from_cache) AS bytes_from_cache,
    SUM(bytes_scanned * percentage_scanned_from_cache)
        / SUM(bytes_scanned) AS pct_scanned_from_cache
FROM snowflake.account_usage.query_history
WHERE start_time >= DATEADD(month, -1, CURRENT_TIMESTAMP())
  AND bytes_scanned > 0
GROUP BY 1
ORDER BY 5;

A low percentage here on a warehouse running frequent, similar queries is a signal — either your auto-suspend timer is too aggressive for the workload, or the queries aren’t actually similar enough at the data level to benefit from a warm cache, even if the SQL looks similar to a human reading it.

Result cache, warehouse cache, metadata cache — side by side

Cache Layer	Where It Lives	What It Stores	Cleared When	Compute Cost
Result Cache	Cloud Services layer	Full query results	24 hours of inactivity, or DDL on underlying tables	Zero — no warehouse needed
Warehouse Cache	SSD on each compute node	Raw compressed micro-partitions	Warehouse suspends, resizes, or node is replaced	Warehouse must be running
Metadata Cache	Cloud Services layer	Min/max values, row counts, partition stats	Rarely — persists with the table	Zero — used for pruning before scan

The auto-suspend tradeoff nobody explains clearly

Snowflake’s own guidance generally points toward short auto-suspend windows to control credit spend, and for spiky, unpredictable workloads that’s the right call. But if a warehouse runs frequent, similar queries back-to-back — a BI tool polling dashboards, an analyst iterating on the same fact table — an aggressive suspend timer means you’re paying the “cold scan” tax on nearly every query, because the cache never gets the chance to stay warm between them.

The fix isn’t complicated once you see the tradeoff: separate warehouses by access pattern. A reporting warehouse that gets hit constantly during business hours can run a longer suspend window, or stay up during known peak hours, while a warehouse running sporadic ad-hoc analyst queries can suspend aggressively without losing much, since the cache wasn’t going to be useful between unrelated queries anyway.

The multi-cluster gotcha

If you’re running a multi-cluster warehouse for concurrency, know that clusters don’t share this cache with each other. Cluster A being fully warm doesn’t help a query that gets routed to Cluster B when Snowflake spins up a new cluster to handle a concurrency spike. That new cluster starts cold, scans from remote storage, and only builds its own local cache from that point forward. Teams chasing consistent query latency under high concurrency often get surprised by this — the warehouse “should” be warm, and on average it is, but any individual query can still land on a cold cluster.

What the warehouse cache does not do

It’s worth being precise about the boundaries here, because I’ve seen this cache get credited for things it isn’t responsible for. It doesn’t store query results — that’s the result cache’s job, and it’s a different layer entirely with a different lifetime. It doesn’t help with intermediate computation that spills to local disk during a large sort or hash join — that’s a separate spillage mechanism tracked under bytes spilled to local storage in query profile, not the same SSD allocation conceptually even though it physically lives in a similar place. And it provides no benefit on the write path for a fresh INSERT into new micro-partitions, since there’s nothing previously cached to reuse.

What’s actually worth doing about this

You don’t get a dial to resize this cache or pin specific tables into it, so the practical levers are all about behavior, not configuration. Match auto-suspend timers to actual access patterns instead of using one default across every warehouse. If a workload is genuinely cache-sensitive — recurring dashboards, iterative analyst sessions — consider a short warm-up query immediately after resume rather than letting the first real user query eat the cold-start cost. And when you’re debugging a “why did this get slower” ticket, percentage_scanned_from_cache should be one of the first three things you check, right alongside partition pruning stats, before you start blaming the query itself.

For the deeper mechanics of how partition pruning interacts with clustering keys, the official Snowflake documentation on warehouse cache optimization is worth reading directly — it’s one of the rare vendor docs pages that actually shows the diagnostic query instead of just describing the concept. The micro-partitions and clustering documentation is the right follow-up if pruning efficiency turns out to be your actual bottleneck instead of cache temperature.

Does the Snowflake warehouse cache get cleared when the warehouse suspends?

Yes, completely. The warehouse cache lives on the SSD of the compute nodes assigned to that warehouse, and those nodes get released back to the pool on suspend. When the warehouse resumes — even seconds later — it’s starting from zero cached data.

Why does percentage_scanned_from_cache show 0% on a brand new warehouse?

Because there’s nothing to scan yet. The first query against any table after a cold start has to pull every micro-partition it needs from remote storage. Cache population happens as a side effect of running queries, not in advance.

Does the warehouse cache help with INSERT, UPDATE, or DELETE performance?

📷 the SSD cache layer, mid-rebuild after a warehouse resume — not glamorous, but it’s where the speed comes from