AdvancedLast updated: 2026-04-27 • 4 sections
Expert questions on database replication, replication groups, failover, disaster recovery, and cross-region/cross-cloud patterns.
Q: How does Snowflake database replication work internally?
Snowflake uses change-based replication: only micro-partitions modified since the last replication run are transferred to secondary accounts. Cross-region and cross-cloud (AWS to Azure, etc.) are supported. Secondary databases are read-only. Replication runs on a schedule (minimum 1 minute) or on-demand via ALTER DATABASE REFRESH. The secondary account maintains a consistent point-in-time snapshot — mid-refresh queries see the pre-refresh state.
Q: What is the difference between database replication and replication groups?
Database replication: replicates a single database. Simple to set up, appropriate for analytics data sharing or single-database DR. Replication groups: replicate multiple databases, shares, users, roles, and warehouses as an atomic unit with a consistent cross-object snapshot. Use groups for full DR (everything needed to run the application) and when referential integrity across databases matters. Single DB replication is sufficient for read replicas or analytics access.
Q: How would you design a DR architecture with RPO < 5 minutes and RTO < 30 minutes?
Architecture: primary account in us-east-1, secondary in us-west-2. Replication group containing all production databases, roles, warehouses. Schedule replication every 1–2 minutes for RPO < 5 min. Monitor lag via REPLICATION_GROUP_REFRESH_HISTORY. Failover procedure: ALTER FAILOVER GROUP fg PRIMARY (promotes secondary). RTO components: DNS/connection-string switching, application reconnection, cache warming. Test failover quarterly. Total RTO under 30 min requires pre-tested runbooks and DNS TTL ≤ 60s.
Q: What are the cost components of Snowflake replication?
Cost components: (1) Data transfer / egress — charged for bytes transferred across regions or clouds (varies by cloud provider). (2) Replication serverless compute — Snowflake uses serverless credits to run the replication job itself. (3) Secondary storage — the secondary account stores a full copy of the replicated data (storage costs apply to both accounts). Optimization: replicate only databases that need DR (not dev/staging), use longer replication intervals for lower-priority databases to reduce transfer frequency.
Q: Walk through the step-by-step failover procedure for a Snowflake replication group.
Step 1: verify replication is current — query REPLICATION_GROUP_REFRESH_HISTORY, check last_refresh_completed. Step 2: in the secondary account, trigger: ALTER FAILOVER GROUP fg PRIMARY. Step 3: the secondary is promoted to primary (read-write). Step 4: update application connection strings to point to the new primary account URL. Step 5: notify stakeholders of the switch. Step 6: after the incident, fail back by triggering ALTER FAILOVER GROUP fg PRIMARY from the original primary account (now acting as secondary).
Q: How do you test DR without impacting the production primary?
Test failover in a cloned secondary account (not the production DR secondary). Steps: (1) Create a test replication group pointing to a separate test-DR account. (2) Trigger a test failover in the test-DR account. (3) Run application smoke tests against the test-DR account. (4) Reverse the failover back. This tests the runbook without risk to the production DR secondary. For production DR: run a planned quarterly test during a low-traffic window with full stakeholder communication.
Q: How does replication interact with Snowflake data sharing?
Shares ARE replicated as part of a replication group (if included). The secondary account receives the share definitions. However, share consumers are account-specific — the consumer accounts linked to the primary share must be re-linked to the secondary share after failover. Data listings via the Marketplace do not auto-failover. Plan: document all share consumers, include a re-link step in the DR runbook.
Time Travel is for recovering from accidental data changes (DROP, DELETE, UPDATE errors) within the same account, up to 90 days. Replication is for disaster recovery — recovering access to data when an entire account or region becomes unavailable. They serve orthogonal use cases: Time Travel = data-level recovery, Replication = availability and geo-redundancy. For a complete DR strategy, you need both: Time Travel handles human errors, replication handles infrastructure outages.
Yes — the pattern: maintain a secondary account as the "blue" environment, run new schema migrations there first, validate with read-only queries, then promote it to primary via failover (making it "green"). The old primary becomes the fallback. This provides zero-downtime schema changes and a clean rollback path. Limitation: replication is account-level, not database-level for failover — you promote the entire replication group.
Query REPLICATION_GROUP_REFRESH_HISTORY or DATABASE_REPLICATION_USAGE_HISTORY for refresh times and bytes transferred. For alerting: create a Snowflake Alert on a query over REPLICATION_GROUP_REFRESH_HISTORY where lag exceeds your RPO threshold — e.g., LAST_REFRESH_COMPLETED < DATEADD(minute, -10, CURRENT_TIMESTAMP()). Route alerts to a Slack/email notification integration. Test the alert by intentionally pausing replication in a non-production environment.