Last year, I interviewed for a Senior Data Engineer role at three different companies. All three used Snowflake heavily. All three asked completely different questions.

The first interview? They grilled me on virtual warehouse sizing and cost optimization for 15 minutes. The second? Entirely focused on data modeling and Time Travel. The third? They threw a live coding challenge at me involving complex window functions and variant data types.

I passed two out of three. The one I failed? I bombed a question about how clustering keys actually work under the hood. I knew the basics but couldn’t explain the micro-partitioning details they were looking for.

That failure taught me something: knowing how to USE Snowflake isn’t enough. You need to understand HOW it works and WHY it works that way.

After that, I spent two weeks deep-diving into Snowflake internals, cost optimization, and performance tuning. I documented every question I encountered—not just from my interviews, but from colleagues who interviewed elsewhere, from Reddit posts, from Slack channels.

This guide is the result. These aren’t generic questions you’ll find on every blog. These are real questions from actual 2025-2026 interviews, organized by difficulty and topic, with detailed answers that actually help you understand the concepts.

How to Use This Guide

Don’t try to memorize answers. That’s the worst approach.

Instead:

  1. Read each question and try answering it yourself first
  2. Check my answer and see what you missed
  3. Actually run the SQL examples in Snowflake
  4. Think about WHY each answer matters in production

The questions are organized by difficulty and topic:

  • Fundamentals – Everyone gets asked these
  • Intermediate – Senior engineers should know these cold
  • Advanced – Architect-level and challenging scenarios
  • Scenario-Based – Real-world problem-solving questions
  • Coding Challenges – Live SQL problems

Let’s get started.


Part 1: Fundamentals (Junior to Mid-Level)

These are the baseline questions. If you’re interviewing for any Snowflake role, expect these.

Question 1: Explain Snowflake’s architecture in simple terms. What makes it different from traditional databases?

What they’re really asking: Do you understand the core value proposition of Snowflake?

Answer:

Snowflake uses a unique multi-cluster, shared-data architecture with three separate layers:

  1. Storage Layer: All data stored in cloud storage (S3, Azure Blob, GCS), organized in a proprietary compressed columnar format. Storage is separate from compute, so you pay only for what you store.
  2. Compute Layer: Virtual warehouses (compute clusters) that process queries. These are completely independent—multiple warehouses can query the same data simultaneously without impacting each other. You can scale them up (bigger warehouses) or out (more warehouses) instantly.
  3. Cloud Services Layer: The brain that handles authentication, metadata management, query optimization, and transaction management. This runs in the background and you don’t manage it.

What makes it different:

  • Traditional databases couple storage and compute—if you need more compute, you also pay for more storage. Snowflake separates them.
  • Traditional data warehouses struggle with concurrency. Snowflake’s multi-cluster architecture means the marketing team running reports doesn’t slow down the engineering team loading data.
  • Zero DBA overhead. No indexes to maintain, no vacuum operations, no query plan tuning (mostly).

Why this matters in production: I’ve seen companies cut their data warehouse costs by 60% after migrating from traditional systems, simply because they only spin up large warehouses when needed instead of running them 24/7.


Question 2: What is a virtual warehouse? How does warehouse sizing work?

What they’re really asking: Can you make smart decisions about compute resources and costs?

Answer:

A virtual warehouse is Snowflake’s compute engine—a cluster of servers that execute queries and DML operations. Think of it as your processing power.

Sizing (X-Small to 6X-Large):

  • Each size up doubles the compute resources (and cost)
  • X-Small: 1 credit/hour, 1 server
  • Small: 2 credits/hour, 2 servers
  • Medium: 4 credits/hour, 4 servers
  • Large: 8 credits/hour, 8 servers
  • And so on


Key characteristics:

  • Auto-suspend: Warehouse pauses after specified idle time (no charges)
  • Auto-resume: Automatically starts when query comes in
  • Multi-cluster: Can scale out (add more warehouses) for concurrency
  • Instant resize: Change size without downtime

Example scenario:

-- Create warehouse for different workloads
CREATE WAREHOUSE data_loading_wh
WITH 
    WAREHOUSE_SIZE = 'LARGE'  -- Big for bulk loads
    AUTO_SUSPEND = 60         -- Stop after 1 min idle
    AUTO_RESUME = TRUE
    INITIALLY_SUSPENDED = TRUE;

CREATE WAREHOUSE reporting_wh
WITH 
    WAREHOUSE_SIZE = 'SMALL'   -- Small for simple reports
    AUTO_SUSPEND = 300         -- Stop after 5 mins idle
    AUTO_RESUME = TRUE
    MIN_CLUSTER_COUNT = 1
    MAX_CLUSTER_COUNT = 3;     -- Scale out if needed

Real-world insight: I once saw a company running an X-Large warehouse 24/7 for a dashboard that only got queried 10 times a day. Switched to Small with auto-suspend—saved $15K/month.

Common misconception: Bigger warehouses aren’t always faster. If your query processes 1MB of data, a 6X-Large warehouse won’t finish much faster than a Small one. Size up only when you’re processing large data volumes or have complex queries.


Question 3: Explain Time Travel. What’s the difference between Time Travel and Fail-safe?

What they’re really asking: Do you understand Snowflake’s data recovery features?

Answer:

Time Travel lets you query, clone, or restore historical data within a retention period (0-90 days, depending on edition):

-- Query data as it was 5 minutes ago
SELECT * FROM orders AT(OFFSET => -60*5);

-- Query data as it was at specific timestamp
SELECT * FROM orders AT(TIMESTAMP => '2026-01-15 10:00:00'::TIMESTAMP_LTZ);

-- Restore dropped table (within retention period)
UNDROP TABLE orders;

-- Clone table as it was yesterday
CREATE TABLE orders_yesterday CLONE orders 
AT(OFFSET => -60*60*24);

Retention periods:

  • Standard Edition: 1 day (default, max 1 day)
  • Enterprise Edition: 1 day (default, max 90 days)
  • Can be set at account, database, schema, or table level

Fail-safe is a 7-day recovery period AFTER Time Travel expires:

  • You can’t access data yourself
  • Requires Snowflake Support to recover
  • Last resort for disaster recovery
  • No extra cost but you pay for the storage

Key differences:

FeatureTime TravelFail-safe
AccessYou control itSnowflake Support only
Duration0-90 days (configurable)7 days (fixed)
PurposeQuery, clone, restoreDisaster recovery
CostIncludedIncluded (storage charged)

Real scenario I encountered:

Someone accidentally ran DELETE FROM customer_orders WHERE 1=1 instead of deleting test data. Panic ensued. Time Travel saved us:

-- Check when the bad delete happened
SELECT * FROM customer_orders 
AT(TIMESTAMP => '2026-01-20 14:30:00'::TIMESTAMP_LTZ)
LIMIT 10;

-- Restore the table to before the delete
CREATE TABLE customer_orders_backup AS 
SELECT * FROM customer_orders 
AT(TIMESTAMP => '2026-01-20 14:29:00'::TIMESTAMP_LTZ);

-- Verify and swap
DROP TABLE customer_orders;
ALTER TABLE customer_orders_backup RENAME TO customer_orders;

Total recovery time: 5 minutes. My hero moment.


Question 4: What are micro-partitions? How does Snowflake organize data?

What they’re really asking: Do you understand how Snowflake stores data under the hood?

Answer:

Micro-partitions are Snowflake’s secret sauce for query performance.

What they are:

  • Immutable blocks of compressed columnar data
  • Typically 50-500MB uncompressed (16MB compressed average)
  • Automatically created when you load data
  • Organized by ingestion order (unless clustering key defined)

Key characteristics:

  1. Column-oriented: Each column stored separately—only read columns you query
  2. Compressed: Automatic compression (often 10:1 ratio)
  3. Immutable: Never modified, only replaced
  4. Metadata-rich: Store min/max values, distinct counts, NULL counts

How Snowflake uses them for pruning:

-- Table with 1 billion rows in 10,000 micro-partitions
SELECT COUNT(*) 
FROM sales 
WHERE sale_date = '2026-01-15';

Snowflake doesn’t scan all 10,000 micro-partitions. It:

  1. Checks metadata for each micro-partition
  2. Identifies only partitions containing 2026-01-15
  3. Scans only those (maybe 10-20 partitions)
  4. Returns result from a tiny fraction of data

Example of metadata:

Micro-partition #4532:
- sale_date: MIN = 2026-01-14, MAX = 2026-01-16
- amount: MIN = 5.00, MAX = 9999.99
- region: DISTINCT_VALUES = ['WEST', 'EAST']

Query optimizer sees this partition MIGHT contain 2026-01-15, so it scans it.

Why this matters: Understanding micro-partitions is key to understanding clustering (which we’ll cover later). Bad data organization = scanning too many micro-partitions = slow queries.


Question 5: What’s the difference between COPY INTO and Snowpipe?

What they’re really asking: Can you choose the right data ingestion method?

Answer:

Both load data into Snowflake, but for different use cases:

COPY INTO (Batch loading):

-- Manual batch load from S3
COPY INTO customer_data
FROM @my_s3_stage/data/customers/
FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = ',' SKIP_HEADER = 1);

-- Scheduled with Snowflake task
CREATE TASK load_daily_data
    WAREHOUSE = loading_wh
    SCHEDULE = '1440 MINUTE'  -- Daily
AS
    COPY INTO customer_data
    FROM @my_s3_stage/data/customers/;

Characteristics:

  • You control when it runs
  • Requires a warehouse
  • Good for scheduled batch loads
  • Can load from multiple files at once
  • Transformation during load (limited)

Snowpipe (Continuous loading):

-- Create pipe for automatic loading
CREATE PIPE customer_data_pipe 
    AUTO_INGEST = TRUE
    AWS_SNS_TOPIC = 'arn:aws:sns:us-east-1:123456789:snowpipe'
AS
    COPY INTO customer_data
    FROM @my_s3_stage/data/customers/
    FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = ',');

Characteristics:

  • Triggered automatically (S3 event notifications)
  • Uses Snowflake-managed compute (you don’t provide warehouse)
  • Near real-time (typically within minutes)
  • Charged per file processed
  • Perfect for streaming data

When to use each:

Use CaseUse This
Daily batch loads from vendorCOPY INTO with TASK
IoT sensor data arriving continuouslySnowpipe
One-time historical data loadCOPY INTO
Log files dropped every 5 minutesSnowpipe
ETL job runs nightly at 2 AMCOPY INTO

Cost consideration:

COPY INTO uses your warehouse credits. Snowpipe charges per file processed (~$0.06 per 1000 files). For continuous small files, Snowpipe is cheaper. For big batch loads, COPY INTO is cheaper.

Real example:

We had application logs being written to S3 every minute. Originally used a Snowflake task running COPY INTO every 5 minutes on a Small warehouse:

  • Cost: 2 credits/hour × 24 hours = 48 credits/day × $2 = $96/day

Switched to Snowpipe:

  • ~1,440 files per day (one per minute)
  • Cost: 1.44 × $0.06 = $0.086/day

Saved $95.91 per day. Month saved? $2,877.


Question 6: Explain cloning in Snowflake. How does zero-copy cloning work?

What they’re really asking: Do you understand one of Snowflake’s coolest features?

Answer:

Cloning creates a copy of an object (table, schema, database) that initially shares the underlying storage with the source. No data is duplicated until changes are made.

How it works:

-- Clone a production table for testing
CREATE TABLE orders_test CLONE orders;

-- Clone entire schema
CREATE SCHEMA dev_schema CLONE prod_schema;

-- Clone at specific point in time
CREATE TABLE orders_yesterday CLONE orders
AT(TIMESTAMP => '2026-01-19 00:00:00'::TIMESTAMP_LTZ);

Under the hood:

  1. Snowflake copies metadata (pointers to micro-partitions)
  2. No actual data is copied—both tables point to same micro-partitions
  3. When you modify clone, only changed micro-partitions are created
  4. Original and clone now share unchanged partitions, have separate changed ones

Example scenario:

-- Production table: 1TB of data
CREATE TABLE orders_test CLONE orders;  
-- Completes in seconds, costs almost nothing

-- Modify test table
DELETE FROM orders_test WHERE order_date < '2025-01-01';
-- Only affected micro-partitions are rewritten

-- Production table unchanged
-- Test table now has its own micro-partitions for deleted data range
-- Both tables still share micro-partitions for unchanged data

Why this is powerful:

Before Snowflake (traditional databases):

  • Want test environment? Copy 1TB of data → wait hours → pay for 1TB more storage
  • “Can I test this change safely?” → “No, too risky”

With Snowflake:

  • Clone in 5 seconds
  • Test destructive changes safely
  • Storage cost = only what you change

Real use cases:

  1. Testing schema changes:
-- Clone production
CREATE TABLE orders_test CLONE orders;

-- Test adding column
ALTER TABLE orders_test ADD COLUMN customer_lifetime_value NUMBER(10,2);

-- Test works? Apply to prod. Doesn't work? Drop clone.
  1. Debug production issues:
-- Customer reports data looks wrong on Jan 15
CREATE TABLE orders_jan15 CLONE orders
AT(TIMESTAMP => '2026-01-15 23:59:59'::TIMESTAMP_LTZ);

-- Analyze the snapshot
SELECT * FROM orders_jan15 WHERE customer_id = 'C12345';
  1. Give analysts safe playground:
-- Analysts can't mess up production even if they try
CREATE DATABASE analytics_sandbox CLONE production;
GRANT ALL ON DATABASE analytics_sandbox TO ROLE analyst_role;

Important note: Clones are independent after creation. Changing the original doesn’t affect the clone (and vice versa).


Question 7: What are Snowflake’s data types? Any unusual ones?

What they’re really asking: Have you worked with Snowflake’s semi-structured data features?

Answer:

Snowflake supports standard SQL types plus some powerful ones for semi-structured data:

Standard types:

  • NUMBER (INT, DECIMAL, FLOAT, etc.)
  • STRING (VARCHAR, TEXT—all same internally)
  • BINARY (for file data)
  • BOOLEAN
  • DATE, TIME, TIMESTAMP

Semi-structured types (this is where Snowflake shines):

VARIANT – Stores JSON, XML, Avro, Parquet:

CREATE TABLE api_responses (
    response_id INT,
    response_data VARIANT,
    received_at TIMESTAMP_LTZ
);

-- Insert JSON data
INSERT INTO api_responses
SELECT 
    1,
    PARSE_JSON('{
        "user": {"id": 123, "name": "John"},
        "order": {"total": 99.50, "items": [{"id": 1}, {"id": 2}]}
    }'),
    CURRENT_TIMESTAMP();

-- Query JSON using dot notation
SELECT 
    response_data:user.id::INT as user_id,
    response_data:user.name::STRING as user_name,
    response_data:order.total::NUMBER(10,2) as order_total,
    response_data:order.items[0].id::INT as first_item_id
FROM api_responses;

ARRAY – Ordered list of values:

CREATE TABLE user_preferences (
    user_id INT,
    favorite_categories ARRAY
);

INSERT INTO user_preferences 
VALUES (1, ARRAY_CONSTRUCT('Electronics', 'Books', 'Sports'));

-- Query arrays
SELECT 
    user_id,
    favorite_categories[0]::STRING as top_category,
    ARRAY_SIZE(favorite_categories) as category_count
FROM user_preferences;

OBJECT – Key-value pairs:

CREATE TABLE user_metadata (
    user_id INT,
    metadata OBJECT
);

INSERT INTO user_metadata
VALUES (1, OBJECT_CONSTRUCT(
    'last_login', '2026-01-20',
    'login_count', 45,
    'preferred_language', 'en'
));

-- Query objects
SELECT 
    user_id,
    metadata['last_login']::STRING as last_login,
    metadata['login_count']::INT as login_count
FROM user_metadata;

Why this matters:

In my previous job, we received API responses from 50+ different services, each with different JSON structures. Before Snowflake:

  • Created separate tables for each API (nightmare to maintain)
  • Or flattened everything (lost nested data)

With VARIANT:

  • One table stores all responses
  • Query any structure with dot notation
  • Add new APIs without schema changes

Performance note: Snowflake actually indexes VARIANT columns automatically. Queries on semi-structured data are surprisingly fast (though not AS fast as native columns).


Question 8: Explain streams and tasks. How do they work together?

What they’re really asking: Can you build automated data pipelines?

Answer:

Streams track changes (CDC – Change Data Capture) to a table:

-- Create stream on source table
CREATE STREAM customer_changes ON TABLE customers;

-- Make changes to source
UPDATE customers SET status = 'INACTIVE' WHERE last_order_date < '2024-01-01';
INSERT INTO customers VALUES (9999, 'New Customer', 'ACTIVE', CURRENT_DATE());
DELETE FROM customers WHERE customer_id = 1234;

-- Stream captures what changed
SELECT 
    customer_id,
    status,
    METADATA$ACTION,  -- INSERT, DELETE, or UPDATE
    METADATA$ISUPDATE,  -- TRUE for UPDATEs
    METADATA$ROW_ID
FROM customer_changes;

Tasks run SQL on a schedule:

-- Create task to process stream
CREATE TASK process_customer_changes
    WAREHOUSE = etl_wh
    SCHEDULE = '5 MINUTE'
    WHEN SYSTEM$STREAM_HAS_DATA('customer_changes')  -- Only run if changes exist
AS
    INSERT INTO customer_history
    SELECT 
        customer_id,
        status,
        METADATA$ACTION as change_type,
        CURRENT_TIMESTAMP() as processed_at
    FROM customer_changes;

-- Must explicitly start task
ALTER TASK process_customer_changes RESUME;

How they work together:

  1. Source table changes (INSERT/UPDATE/DELETE)
  2. Stream captures changes
  3. Task wakes up (based on schedule)
  4. Task checks if stream has data
  5. If yes, processes the stream
  6. Stream marks processed rows as consumed

Real-world example – Incremental data pipeline:

-- Source: Raw sales data
CREATE TABLE raw_sales (
    sale_id INT,
    product_id INT,
    amount NUMBER(10,2),
    sale_date DATE
);

-- Stream tracks changes
CREATE STREAM raw_sales_stream ON TABLE raw_sales;

-- Target: Aggregated daily sales
CREATE TABLE daily_sales_summary (
    sale_date DATE,
    total_amount NUMBER(15,2),
    transaction_count INT,
    last_updated TIMESTAMP_LTZ
);

-- Task processes incrementally
CREATE TASK aggregate_daily_sales
    WAREHOUSE = etl_wh
    SCHEDULE = '10 MINUTE'
    WHEN SYSTEM$STREAM_HAS_DATA('raw_sales_stream')
AS
MERGE INTO daily_sales_summary target
USING (
    SELECT 
        sale_date,
        SUM(amount) as total_amount,
        COUNT(*) as transaction_count
    FROM raw_sales_stream
    WHERE METADATA$ACTION = 'INSERT'  -- Only process new records
    GROUP BY sale_date
) source
ON target.sale_date = source.sale_date
WHEN MATCHED THEN 
    UPDATE SET 
        total_amount = target.total_amount + source.total_amount,
        transaction_count = target.transaction_count + source.transaction_count,
        last_updated = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
    INSERT (sale_date, total_amount, transaction_count, last_updated)
    VALUES (source.sale_date, source.total_amount, source.transaction_count, CURRENT_TIMESTAMP());

ALTER TASK aggregate_daily_sales RESUME;

Benefits of this pattern:

  • Only process changed data (efficient)
  • No need to track “last processed timestamp”
  • Automatic scheduling
  • Warehouse auto-suspends when no work

Common gotcha: Streams are consumed when read. If multiple processes need the same changes, create multiple streams or capture to an intermediate table first.


Question 9: What’s the difference between a view, materialized view, and secure view?

What they’re really asking: Do you understand Snowflake’s different abstraction options?

Answer:

Regular View – Stored query, executed every time:

CREATE VIEW active_customers AS
SELECT 
    customer_id,
    customer_name,
    total_orders,
    last_order_date
FROM customers
WHERE status = 'ACTIVE';

-- Every query on this view re-executes the SELECT
SELECT * FROM active_customers WHERE total_orders > 10;

Pros: Always current, no storage cost
Cons: Can be slow for complex queries, repeated computation

Materialized View – Query results stored and refreshed:

CREATE MATERIALIZED VIEW daily_sales_mv AS
SELECT 
    sale_date,
    product_category,
    SUM(amount) as total_sales,
    COUNT(*) as transaction_count
FROM sales
GROUP BY sale_date, product_category;

-- Query hits stored results (fast!)
SELECT * FROM daily_sales_mv WHERE sale_date = CURRENT_DATE();

Pros: Fast queries (pre-computed), Snowflake handles refresh automatically
Cons: Storage cost, slight data lag (refresh takes time), only supports certain query types

When Snowflake refreshes: Automatically in background when base tables change. You can also manually refresh:

-- Check if MV needs refresh
SHOW MATERIALIZED VIEWS LIKE 'daily_sales_mv';

-- Manual refresh (rarely needed)
ALTER MATERIALIZED VIEW daily_sales_mv SUSPEND;
ALTER MATERIALIZED VIEW daily_sales_mv RESUME;

Secure View – Hides definition and optimizes differently:

CREATE SECURE VIEW customer_sensitive_data AS
SELECT 
    customer_id,
    customer_name,
    -- Hide sensitive logic
    CASE 
        WHEN credit_score > 750 THEN 'Excellent'
        WHEN credit_score > 650 THEN 'Good'
        ELSE 'Fair'
    END as credit_rating
FROM customers;

-- Users can query but can't see the view definition
-- Can't use SHOW VIEW or GET_DDL to see the logic

Pros: Data governance, prevents data leakage through query optimization
Cons: Slightly less performance optimization (Snowflake can’t push down predicates as aggressively)

Real example – Why secure views matter:

-- Non-secure view (BAD for sensitive data)
CREATE VIEW employee_salaries AS
SELECT employee_id, department, salary
FROM employees
WHERE department = CURRENT_USER();  -- Only show user's department

-- Problem: Snowflake optimizes queries
-- User queries: SELECT * FROM employee_salaries WHERE salary > 100000
-- Snowflake might rewrite as: SELECT * FROM employees WHERE salary > 100000 AND department = CURRENT_USER()
-- Query plan visible to user might reveal salary data from other departments!

-- Secure view (GOOD)
CREATE SECURE VIEW employee_salaries_secure AS
SELECT employee_id, department, salary
FROM employees
WHERE department = CURRENT_USER();

-- Query plans don't reveal underlying data

When to use each:

TypeUse Case
ViewSimple transformations, always need current data
Materialized ViewComplex aggregations queried frequently, can tolerate slight lag
Secure ViewSensitive data with row-level security or proprietary business logic

Question 10: How do you handle slowly changing dimensions (SCD) in Snowflake?

What they’re really asking: Do you know dimensional modeling and Snowflake-specific approaches?

Answer:

Snowflake’s Time Travel and Streams make SCD patterns easier than traditional approaches.

Type 1 SCD (Overwrite – No history):

MERGE INTO dim_customer target
USING stage_customer source
ON target.customer_id = source.customer_id
WHEN MATCHED THEN 
    UPDATE SET 
        target.customer_name = source.customer_name,
        target.email = source.email,
        target.updated_date = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
    INSERT (customer_id, customer_name, email, created_date)
    VALUES (source.customer_id, source.customer_name, source.email, CURRENT_TIMESTAMP());

Type 2 SCD (Keep full history with effective dates):

-- Dimension table structure
CREATE TABLE dim_customer_scd2 (
    customer_key INT AUTOINCREMENT,  -- Surrogate key
    customer_id INT,                  -- Natural key
    customer_name STRING,
    email STRING,
    address STRING,
    effective_start_date DATE,
    effective_end_date DATE,
    is_current BOOLEAN
);

-- Processing new records
MERGE INTO dim_customer_scd2 target
USING (
    SELECT 
        source.*,
        target.customer_key as existing_key
    FROM stage_customer source
    LEFT JOIN dim_customer_scd2 target 
        ON source.customer_id = target.customer_id 
        AND target.is_current = TRUE
) source
ON target.customer_key = source.existing_key
-- Expire old record
WHEN MATCHED AND (
    target.customer_name != source.customer_name 
    OR target.email != source.email 
    OR target.address != source.address
) THEN 
    UPDATE SET 
        target.effective_end_date = CURRENT_DATE(),
        target.is_current = FALSE
-- Insert unchanged records (do nothing)
WHEN MATCHED THEN UPDATE SET target.customer_key = target.customer_key  -- No-op
-- Insert new version for changed records
WHEN NOT MATCHED THEN
    INSERT (customer_id, customer_name, email, address, effective_start_date, effective_end_date, is_current)
    VALUES (source.customer_id, source.customer_name, source.email, source.address, CURRENT_DATE(), '9999-12-31'::DATE, TRUE);

-- Insert new version for changed records (second pass)
INSERT INTO dim_customer_scd2 (customer_id, customer_name, email, address, effective_start_date, effective_end_date, is_current)
SELECT 
    source.customer_id,
    source.customer_name,
    source.email,
    source.address,
    CURRENT_DATE() as effective_start_date,
    '9999-12-31'::DATE as effective_end_date,
    TRUE as is_current
FROM stage_customer source
INNER JOIN dim_customer_scd2 target
    ON source.customer_id = target.customer_id
    AND target.is_current = FALSE  -- Was just expired
    AND target.effective_end_date = CURRENT_DATE()  -- Today
WHERE NOT EXISTS (
    SELECT 1 FROM dim_customer_scd2 
    WHERE customer_id = source.customer_id 
    AND is_current = TRUE
);

Snowflake-specific approach using Streams:

-- Stream tracks all changes automatically
CREATE STREAM customer_changes_stream ON TABLE source_customers;

-- Process SCD Type 2 with stream metadata
CREATE TASK process_customer_scd
    WAREHOUSE = etl_wh
    SCHEDULE = '1 HOUR'
    WHEN SYSTEM$STREAM_HAS_DATA('customer_changes_stream')
AS
BEGIN
    -- Expire changed records
    UPDATE dim_customer_scd2 
    SET 
        effective_end_date = CURRENT_DATE() - 1,
        is_current = FALSE
    WHERE customer_id IN (
        SELECT customer_id 
        FROM customer_changes_stream 
        WHERE METADATA$ACTION IN ('INSERT', 'UPDATE')
    )
    AND is_current = TRUE;

    -- Insert new versions
    INSERT INTO dim_customer_scd2
    SELECT 
        customer_id,
        customer_name,
        email,
        CURRENT_DATE() as effective_start_date,
        '9999-12-31'::DATE as effective_end_date,
        TRUE as is_current
    FROM customer_changes_stream
    WHERE METADATA$ACTION IN ('INSERT', 'UPDATE');
END;

Type 3 SCD (Limited history – previous value columns):

CREATE TABLE dim_customer_scd3 (
    customer_id INT PRIMARY KEY,
    customer_name STRING,
    current_email STRING,
    previous_email STRING,
    email_changed_date DATE,
    current_address STRING,
    previous_address STRING,
    address_changed_date DATE
);

MERGE INTO dim_customer_scd3 target
USING stage_customer source
ON target.customer_id = source.customer_id
WHEN MATCHED AND target.current_email != source.email THEN
    UPDATE SET 
        previous_email = target.current_email,
        current_email = source.email,
        email_changed_date = CURRENT_DATE()
WHEN MATCHED AND target.current_address != source.address THEN
    UPDATE SET
        previous_address = target.current_address,
        current_address = source.address,
        address_changed_date = CURRENT_DATE();

My recommendation: Use SCD Type 2 with Streams. It’s clean, automated, and leverages Snowflake’s strengths. Plus Time Travel gives you an audit trail even beyond your