I’ve been working with Snowflake for the past three years, and honestly, query optimization used to keep me up at night. Our monthly bills were climbing, queries were timing out, and my team was getting frustrated. If you’re reading this, you’re probably in a similar boat.
Let me share what I’ve learned through trial and error, some expensive mistakes, and eventually figuring out what actually moves the needle.
Why Your Queries Are Probably Slower (and More Expensive) Than They Should Be
Last month, I was debugging a dashboard that was taking forever to load. The query looked fine at first glance, but it was chewing through credits like crazy. Turns out, I was making three classic mistakes that most people miss.
The thing about Snowflake is that it’s incredibly powerful, but that power comes with responsibility. Unlike traditional databases where you might get away with sloppy queries, Snowflake will happily scan your entire data warehouse if you let it.
The Clustering Key Strategy That Cut Our Costs by 40%
Here’s a real scenario from our production environment. We had a massive events table with about 2 billion rows, and every query against it was painful.
-- Before: This was scanning almost the entire table
SELECT user_id, event_type, COUNT(*) as event_count
FROM events
WHERE event_date BETWEEN '2026-01-01' AND '2026-01-31'
GROUP BY user_id, event_type;
The query was taking 45 seconds and using a large warehouse. After adding a clustering key on event_date, the same query dropped to 8 seconds on a medium warehouse.
ALTER TABLE events CLUSTER BY (event_date);
But here’s what nobody tells you: clustering isn’t free. It costs credits to maintain, so you need to be strategic. We only cluster on columns that appear frequently in WHERE clauses and have high cardinality. For our events table, event_date made perfect sense because almost every query filtered on it.
The sweet spot? Tables over 1TB that you query frequently with predictable filter patterns.
Search Optimization Service: My New Secret Weapon
Snowflake rolled out some improvements to their Search Optimization Service this year, and it’s been a game changer for our point lookup queries. We have a products table where users constantly search by SKU or product name.
Before enabling search optimization, these queries were doing full table scans even though we had proper filters:
SELECT * FROM products WHERE sku = 'PROD-2026-XYZ-123';
After enabling it:
ALTER TABLE products ADD SEARCH OPTIMIZATION ON EQUALITY(sku, product_name);
Point lookups went from 3-4 seconds to under 200 milliseconds. The cost? About $2 per day for maintenance on a 50 million row table. Totally worth it for user-facing queries.
The Result Cache Trick That’s Often Misunderstood
Everyone knows Snowflake caches results for 24 hours, but most people don’t optimize for it. I see developers constantly writing queries that can’t benefit from the cache.
Bad practice:
SELECT *, CURRENT_TIMESTAMP() as query_time
FROM sales
WHERE sale_date = CURRENT_DATE();
Every time this runs, CURRENT_TIMESTAMP() changes, so you get a cache miss. Same with CURRENT_DATE() in the WHERE clause.
Better approach:
-- Run this once at the start of your ETL job
SET query_date = CURRENT_DATE();
-- Then use the variable
SELECT * FROM sales WHERE sale_date = $query_date;
This simple change increased our cache hit rate from 12% to 68% for our daily reporting jobs.
Materialized Views: When They’re Worth It (And When They’re Not)
I wasted a week last year building materialized views that actually made things worse. Here’s what I learned.
Materialized views work great when you have expensive aggregations that get queried repeatedly, but the base data doesn’t change often. We have a customer_lifetime_value table that aggregates data from multiple sources:
CREATE MATERIALIZED VIEW customer_ltv_summary AS
SELECT
customer_id,
SUM(order_total) as total_revenue,
COUNT(DISTINCT order_id) as order_count,
AVG(order_total) as avg_order_value,
MAX(order_date) as last_order_date
FROM orders
GROUP BY customer_id;
This view gets queried hundreds of times per day, but the underlying orders table only gets new data once daily during our ETL run. Perfect use case.
Bad use case? We tried materializing a view on our real-time events stream. The constant refreshing cost more than just running the queries directly.
The Warehouse Sizing Reality Check
I used to think bigger warehouses were always faster. Turns out, that’s not how it works.
For queries that process small amounts of data (under 100MB), an X-Small warehouse is often just as fast as an X-Large. We were using Large warehouses for everything because “we wanted it fast,” but we were just burning money.
Here’s my current rule of thumb:
∙ X-Small/Small: Point lookups, small aggregations, dev work
∙ Medium: Regular analytical queries processing under 1GB
∙ Large: Heavy aggregations, complex joins over 1GB
∙ X-Large and above: Only when you’re processing multiple terabytes or need serious parallelism
We also started using multi-cluster warehouses for our user-facing dashboards. During business hours, it auto-scales up to 3 clusters, then scales back down to 1 at night. No more queue times during peak hours, and we’re not paying for idle capacity.
Query Pruning: The Feature You’re Probably Not Leveraging Snowflake’s automatic query pruning is amazing when you structure your data correctly. We partition our fact tables by date and use it consistently in WHERE clauses.
This query scans maybe 1% of the table:
SELECT customer_id, SUM(amount)
FROM transactions
WHERE transaction_date BETWEEN '2026-01-01' AND '2026-01-07'
GROUP BY customer_id;
You can check how much pruning is happening with:
SELECT * FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE query_id = 'your_query_id';
Look at the PARTITIONS_SCANNED vs PARTITIONS_TOTAL ratio. If you’re scanning more than 20% of partitions regularly, your data structure needs work.
The JOIN Order Mistake Costing You Time
Snowflake’s optimizer is smart, but you can still help it. When joining tables, I always put the largest table first and use explicit join conditions.
Inefficient:
SELECT *
FROM small_table s, huge_table h, medium_table m
WHERE s.id = h.small_id
AND h.id = m.huge_id;
Better:
SELECT *
FROM huge_table h
INNER JOIN medium_table m ON h.id = m.huge_id
INNER JOIN small_table s ON h.small_id = s.id;
Also, I always use INNER JOIN instead of comma-separated FROM clauses. It’s clearer and gives the optimizer better information.
Monitoring That Actually Helps
Every week, I run this query to find our most expensive operations:
SELECT
query_type,
warehouse_name,
user_name,
AVG(execution_time/1000) as avg_seconds,
SUM(credits_used_cloud_services) as total_credits,
COUNT(*) as query_count
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP())
AND execution_status = 'SUCCESS'
GROUP BY 1,2,3
HAVING total_credits > 1
ORDER BY total_credits DESC
LIMIT 20;
This shows me exactly where credits are going and which users or processes need optimization attention.
What’s Actually Working for Us Right Now
After months of optimization, here’s what made the biggest difference:
1. We reduced our warehouse usage by 45% just by right-sizing and using auto-suspend aggressively (1 minute timeout for dev, 5 minutes for production).
2. Clustering our largest tables on date columns cut query times in half for 80% of our analytics workload.
3. Moving to task-based scheduling instead of always-on warehouses saved us about $3000 monthly.
4. Teaching our analysts to use LIMIT during development seems obvious, but it made a huge difference in reducing waste.
The Bottom Line
Query optimization in Snowflake isn’t about one magic trick. It’s about understanding your workload, monitoring what’s actually happening, and making incremental improvements.
Start with the low-hanging fruit: right-size your warehouses, add clustering to your biggest tables, and make sure your queries can leverage the result cache. Then move on to more advanced techniques like search optimization and materialized views for specific use cases.