AdvancedLast updated: 2026-04-09 • 3 sections
Expert questions on Snowpipe Streaming, Kafka connector, classic Snowpipe, and real-time ingestion patterns.
Q: Streaming vs classic Snowpipe?
Classic: file-based, SQS notification, COPY INTO, 30-120s latency. Streaming: row-based Ingest SDK (Java), insertRows(), 1-10s latency, no files or stages. Cheaper for high-volume small payloads due to no per-file overhead.
Q: Kafka connector and Streaming mode?
snowflake-kafka-connector has two modes: Snowpipe (buffers to files, 1-3 min) or Streaming (set ingestion.method=SNOWPIPE_STREAMING, sub-10s). Each Kafka partition maps to one channel for ordered exactly-once ingestion.
Q: How does exactly-once delivery work?
Channel-based offset tracking: insertRows(batch, offsetToken) stores rows and offset atomically. On recovery: getLatestCommittedOffsetToken() returns last committed. Resume from offset+1. Crash before recording = at-least-once (handle with idempotent processing).
Q: Design ingestion for 50K events/sec from Kafka?
50K eps at 1KB = 50MB/s. Kafka Connect Streaming mode, tasks.max=6 (about 10K eps/task). CLUSTER BY ingestion timestamp. Buffer: flush.time=10s, count.records=10000. Monitor CLIENT_HISTORY. Downstream: Dynamic Tables or streams+tasks.
Q: When choose Streaming vs Snowpipe vs COPY INTO?
Streaming: sub-30s latency, continuous rows, small payloads. Snowpipe: file-based arrival, large batches, 1-3 min acceptable. COPY INTO: one-time bulk loads and backfills. Streaming avoids small-file performance penalty.
Q: Schema evolution with Kafka?
Schemaless (JsonConverter): land in VARIANT column, parse downstream with Dynamic Tables. Schema-aware (Avro+Registry): schema.evolution=TRUE auto-adds columns. Incompatible type changes fail. Best: VARIANT for maximum flexibility, schema-on-read.
Q: Production monitoring?
Snowpipe: PIPE_USAGE_HISTORY, COPY_HISTORY, SYSTEM$PIPE_STATUS. Streaming: CLIENT_HISTORY, FILE_MIGRATION_HISTORY. Both: alert on failures, monitor MAX(ingestion_timestamp) freshness with scheduled task.
Yes. Ingest SDK (Java) lets any app call insertRows() directly. IoT gateways, Debezium CDC. Java-only; Python uses PUT+COPY INTO.
Per-second client compute (small) + file migration serverless compute. No per-row charges. Cheaper than Snowpipe for high-frequency small records.