AdvancedLast updated: 2026-04-09 • 3 sections
Expert questions on micro-partition pruning, clustering keys, search optimization, query profiling, spilling to disk, and caching layers.
Q: Explain micro-partition architecture and performance impact.
Immutable columnar files 50-500MB uncompressed. Metadata per column: min/max, distinct count, null count. Performance depends on partition pruning: WHERE clause checks metadata to skip non-matching partitions. Chronologically loaded tables prune well on date. Random-order tables prune poorly, needs clustering.
Q: How does CLUSTER BY work and how to choose keys?
Co-locates rows with similar values in same partitions via background reclustering. Choose: most frequent WHERE/JOIN columns, low-to-medium cardinality first, max 3-4 columns. When: tables over 1TB with over 50% partition scans, CLUSTERING_INFORMATION depth over 5-10. Skip: under 100GB, varied query patterns, high write churn.
Q: Query scans 500GB, returns 100 rows, 0% pruning. How to fix?
Add CLUSTER BY on filter column. High-cardinality filter (user_id)? Use Search Optimization Service for point lookups. For joins: cluster both tables on join key. Quick fix: materialize filtered subset.
Q: Explain spilling to local disk vs remote storage.
When intermediate results exceed warehouse memory: local SSD (2-5x slower), then remote cloud storage (10-50x slower). Query Profile shows Bytes Spilled. Fix: scale up warehouse (doubles memory per size), reduce result set, cluster join keys, add LIMIT for sorts. Remote spill over 0 means severely memory-constrained.
Q: When does Search Optimization outperform clustering?
SOS builds bloom-filter-like indexes. Better for: high-cardinality point lookups (user_id, email), LIKE/ILIKE substring searches, geospatial queries, secondary columns on already-clustered tables. Serverless background cost, only worthwhile for frequent selective lookups.
Q: How to diagnose a slow query with Query Profile?
Systematic: (1) Time breakdown: compilation vs queuing vs execution. Queuing = warehouse overloaded. (2) Most expensive operator (thickest arrow): TableScan (poor pruning), JoinFilter (large intermediates), Sort (spill), Aggregate (high cardinality). (3) Per operator: partitions scanned vs total, bytes spilled, rows produced vs input. (4) Anti-patterns: cartesian joins, UNION instead of UNION ALL.
Q: Explain the three cache layers.
Result cache: full query results 24hrs, zero credits, invalidated on data change. Metadata cache: min/max/count per partition, COUNT(*) and MIN/MAX instant, always active. Warehouse cache: warm SSD of recently read partitions, cleared on suspend. Strategy: result cache for repeated dashboards, warehouse cache for interactive analysis sessions.
No. Simple queries (point lookups, small aggregations) run identically on XS and 4XL. Scaling helps for: large scans, spilling queries, high concurrency. Profile across sizes to find diminishing returns.
QUERY_HISTORY: bytes_scanned=0 with results returned = cache hit. Bypassed when: data changed, USE_CACHED_RESULT=FALSE, non-deterministic functions, or over 24 hours elapsed.