Snowflake Query Tuning — Expert Interview Questions

AdvancedLast updated: 2026-04-09 • 3 sections

Expert questions on micro-partition pruning, clustering keys, search optimization, query profiling, spilling to disk, and caching layers.

Pruning and Clustering

Q: Explain micro-partition architecture and performance impact.

Immutable columnar files 50-500MB uncompressed. Metadata per column: min/max, distinct count, null count. Performance depends on partition pruning: WHERE clause checks metadata to skip non-matching partitions. Chronologically loaded tables prune well on date. Random-order tables prune poorly, needs clustering.

Q: How does CLUSTER BY work and how to choose keys?

Co-locates rows with similar values in same partitions via background reclustering. Choose: most frequent WHERE/JOIN columns, low-to-medium cardinality first, max 3-4 columns. When: tables over 1TB with over 50% partition scans, CLUSTERING_INFORMATION depth over 5-10. Skip: under 100GB, varied query patterns, high write churn.

Q: Query scans 500GB, returns 100 rows, 0% pruning. How to fix?

Add CLUSTER BY on filter column. High-cardinality filter (user_id)? Use Search Optimization Service for point lookups. For joins: cluster both tables on join key. Quick fix: materialize filtered subset.

Q: Explain spilling to local disk vs remote storage.

When intermediate results exceed warehouse memory: local SSD (2-5x slower), then remote cloud storage (10-50x slower). Query Profile shows Bytes Spilled. Fix: scale up warehouse (doubles memory per size), reduce result set, cluster join keys, add LIMIT for sorts. Remote spill over 0 means severely memory-constrained.

Q: When does Search Optimization outperform clustering?

SOS builds bloom-filter-like indexes. Better for: high-cardinality point lookups (user_id, email), LIKE/ILIKE substring searches, geospatial queries, secondary columns on already-clustered tables. Serverless background cost, only worthwhile for frequent selective lookups.

Profiling and Caching

Q: How to diagnose a slow query with Query Profile?

Systematic: (1) Time breakdown: compilation vs queuing vs execution. Queuing = warehouse overloaded. (2) Most expensive operator (thickest arrow): TableScan (poor pruning), JoinFilter (large intermediates), Sort (spill), Aggregate (high cardinality). (3) Per operator: partitions scanned vs total, bytes spilled, rows produced vs input. (4) Anti-patterns: cartesian joins, UNION instead of UNION ALL.

Q: Explain the three cache layers.

Result cache: full query results 24hrs, zero credits, invalidated on data change. Metadata cache: min/max/count per partition, COUNT(*) and MIN/MAX instant, always active. Warehouse cache: warm SSD of recently read partitions, cleared on suspend. Strategy: result cache for repeated dashboards, warehouse cache for interactive analysis sessions.

Performance Tips

Frequently Asked Questions

Should I always use the largest warehouse?

No. Simple queries (point lookups, small aggregations) run identically on XS and 4XL. Scaling helps for: large scans, spilling queries, high concurrency. Profile across sizes to find diminishing returns.

How to check if queries use result cache?

QUERY_HISTORY: bytes_scanned=0 with results returned = cache hit. Bypassed when: data changed, USE_CACHED_RESULT=FALSE, non-deterministic functions, or over 24 hours elapsed.

Related Cheat Sheets

Top 30 Snowflake Interview Questions & AnswersSnowflake Cost Optimization — Expert Interview QuestionsSnowflake Dynamic Tables — Expert Interview Questions
← All Cheat Sheets