Snowpipe Streaming & Kafka — Expert Interview Questions

AdvancedLast updated: 2026-04-09 • 3 sections

Expert questions on Snowpipe Streaming, Kafka connector, classic Snowpipe, and real-time ingestion patterns.

Streaming Architecture

Q: Streaming vs classic Snowpipe?

Classic: file-based, SQS notification, COPY INTO, 30-120s latency. Streaming: row-based Ingest SDK (Java), insertRows(), 1-10s latency, no files or stages. Cheaper for high-volume small payloads due to no per-file overhead.

Q: Kafka connector and Streaming mode?

snowflake-kafka-connector has two modes: Snowpipe (buffers to files, 1-3 min) or Streaming (set ingestion.method=SNOWPIPE_STREAMING, sub-10s). Each Kafka partition maps to one channel for ordered exactly-once ingestion.

Q: How does exactly-once delivery work?

Channel-based offset tracking: insertRows(batch, offsetToken) stores rows and offset atomically. On recovery: getLatestCommittedOffsetToken() returns last committed. Resume from offset+1. Crash before recording = at-least-once (handle with idempotent processing).

Q: Design ingestion for 50K events/sec from Kafka?

50K eps at 1KB = 50MB/s. Kafka Connect Streaming mode, tasks.max=6 (about 10K eps/task). CLUSTER BY ingestion timestamp. Buffer: flush.time=10s, count.records=10000. Monitor CLIENT_HISTORY. Downstream: Dynamic Tables or streams+tasks.

Q: When choose Streaming vs Snowpipe vs COPY INTO?

Streaming: sub-30s latency, continuous rows, small payloads. Snowpipe: file-based arrival, large batches, 1-3 min acceptable. COPY INTO: one-time bulk loads and backfills. Streaming avoids small-file performance penalty.

Advanced Patterns

Q: Schema evolution with Kafka?

Schemaless (JsonConverter): land in VARIANT column, parse downstream with Dynamic Tables. Schema-aware (Avro+Registry): schema.evolution=TRUE auto-adds columns. Incompatible type changes fail. Best: VARIANT for maximum flexibility, schema-on-read.

Q: Production monitoring?

Snowpipe: PIPE_USAGE_HISTORY, COPY_HISTORY, SYSTEM$PIPE_STATUS. Streaming: CLIENT_HISTORY, FILE_MIGRATION_HISTORY. Both: alert on failures, monitor MAX(ingestion_timestamp) freshness with scheduled task.

Ingestion Tips

Frequently Asked Questions

Streaming without Kafka?

Yes. Ingest SDK (Java) lets any app call insertRows() directly. IoT gateways, Debezium CDC. Java-only; Python uses PUT+COPY INTO.

Cost model?

Per-second client compute (small) + file migration serverless compute. No per-row charges. Cheaper than Snowpipe for high-frequency small records.

Related Cheat Sheets

Snowflake Streams & Tasks — Expert Interview QuestionsTop 30 Snowflake Interview Questions & AnswersSnowflake Cost Optimization — Expert Interview Questions
← All Cheat Sheets