AdvancedLast updated: 2026-04-27 • 4 sections
Expert questions on VARIANT, OBJECT, ARRAY data types, FLATTEN, LATERAL, JSON/XML/Parquet handling, and schema-on-read patterns.
Q: How does Snowflake store VARIANT data internally and why is it fast?
Snowflake stores VARIANT in an optimized columnar format. When JSON/Avro/Parquet loads into a VARIANT column, Snowflake analyzes the structure and stores frequently-occurring top-level paths as separate internal columns (columnar optimization). This means v:user:name can prune and scan like a native column — partition min/max metadata is maintained per JSON path. Queries on well-structured, consistent JSON schemas perform nearly as fast as native typed columns.
Q: When do you use VARIANT vs OBJECT vs ARRAY?
VARIANT: the practical default — holds any JSON value (object, array, scalar, null). Use for raw ingestion and flexible schemas. OBJECT: explicitly a key-value map. Use when you want to enforce that the column contains a JSON object and for clarity in UDF type signatures. ARRAY: an ordered list. Use for multi-value attributes where order matters. In practice, most teams use VARIANT everywhere because it handles all three cases; OBJECT and ARRAY add type safety and documentation intent.
Q: How do you query nested JSON paths efficiently in Snowflake?
Path access: v:user:address:city (colon-separated). Bracket notation for special chars or dynamic paths: v["event-type"]. Array element: v:items[0]:name. Deep nesting: v:a:b:c:d. Type casting: v:price::FLOAT, v:id::INT. Performance: frequently used paths get columnar optimization — extract them as typed virtual columns in a view for downstream use. For unknown schema: use FLATTEN + GET to dynamically discover keys.
Q: How do you handle schema evolution in semi-structured data ingestion?
Schema-on-read approach: store raw JSON in a VARIANT staging table, extract paths in downstream views or Dynamic Tables. Evolution is automatic — new keys are accessible via path notation immediately. For structured downstream tables: use OBJECT_CONSTRUCT + FLATTEN to extract known paths, add a catch-all VARIANT column for unknown/future fields. For breaking changes (key rename, type change): version the extraction layer as a new view rather than altering the raw table.
Q: How does FLATTEN work and when do you use LATERAL FLATTEN?
FLATTEN is a table function that converts an array or object into a set of rows. SELECT VALUE FROM TABLE(FLATTEN(input => v:items)) returns one row per array element. LATERAL FLATTEN is the inline version: SELECT t.id, f.value FROM my_table t, LATERAL FLATTEN(input => t.tags) f — the LATERAL keyword allows the table function to reference the outer table column. Use FLATTEN for: normalizing arrays into rows, unpacking nested event arrays, exploding product tags or attributes.
Q: How do you aggregate and rebuild JSON structures in SQL?
Build objects: OBJECT_CONSTRUCT('key1', val1, 'key2', val2). Build arrays: ARRAY_AGG(col) to aggregate rows back into an array. Combine: SELECT OBJECT_CONSTRUCT('user_id', id, 'orders', ARRAY_AGG(order_id)) FROM orders GROUP BY id. For nested structures: OBJECT_CONSTRUCT can nest other OBJECT_CONSTRUCT and ARRAY_AGG calls. Use ARRAY_DISTINCT, ARRAY_SORT, ARRAY_SLICE for array manipulation.
Q: How do you load and query Parquet, Avro, and XML in Snowflake?
Parquet/Avro: create a file format (TYPE=PARQUET or TYPE=AVRO), load into a VARIANT column via COPY INTO or Snowpipe. Parquet columns map automatically to VARIANT sub-paths. XML: TYPE=XML in the file format; loads as a VARIANT with XMLGET and XPATH-style path access. For Parquet with a known schema: use the MATCH_BY_COLUMN_NAME option in COPY INTO to load directly into a typed table instead of VARIANT.
Q: What is STRIP_OUTER_ARRAY and when do you need it?
When a JSON file contains a single top-level array ([{...}, {...}]) and you want each element to load as a separate row (not the whole array as one VARIANT), set STRIP_OUTER_ARRAY=TRUE in the JSON file format. Without it, the entire file loads as a single VARIANT cell containing the array. With it, each object in the array becomes a separate row. This is the most common JSON ingestion issue when records come as array-wrapped batches.
Both, in layers. Ingest raw JSON into a VARIANT staging table for flexibility and resilience to schema changes. Extract frequently queried paths into typed columns in a downstream table or view for query performance and type safety. The two-layer pattern (raw VARIANT → typed extraction) is the production standard: the raw layer is the source of truth, the typed layer is the query-optimized serving layer.
JSON null: a JSON key explicitly set to null stores as a SQL NULL in Snowflake. Missing key: accessing a path that does not exist returns SQL NULL (not an error). This means IS NULL is true for both explicitly-null and missing paths. To distinguish: use TYPEOF(v:key) — returns 'null_value' for JSON null and NULL for a missing path. Important for data quality: always check TYPEOF in data validation to differentiate absent from explicitly-nulled fields.
Columnar optimization covers frequently occurring, consistently named top-level paths — these query at near-native column speed. Deeply nested paths (4+ levels) and infrequently occurring keys are stored as raw bytes and scanned linearly — slower. FLATTEN on large arrays generates high row counts and can be memory-intensive. Mitigation: extract the most queried paths into a typed materialized table, use FLATTEN only in staging pipelines not in BI-facing queries, cluster on frequently filtered VARIANT path values by extracting them to a real column.