Most data engineers I talk to still store everything in Snowflake native format. It’s simple: load data, query data, done. But here’s what nobody’s talking about: if you’re querying that data from anywhere else — Spark, Databricks, even just a local Python script — you’re paying a hidden “data tax.” Redundant storage, egress fees, ETL pipeline complexity. For Fortune 500 companies, that tax runs $2 million to $7 million a year. And Snowflake’s new Apache Iceberg v3 support (GA May 2026) actually changes the math. But migrating is a choice, not a reflex — and there are specific gotchas that’ll bite you if you don’t plan right.

Iceberg decision x class=

The honest decision tree: migrate if you’re paying egress fees or running multi-engine queries.

TL;DR

→ Apache Iceberg v3 is GA on Snowflake (May 2026). New features: deletion vectors (10x faster DML), row lineage for CDC, VARIANT type for semi-structured data, nanosecond timestamps, default column values.

→ Month-to-month costs are roughly equal to native Snowflake tables (compute identical, storage ~$23/TB native vs ~$0.023/GB S3, negligible difference).

→ Migrate if: (a) you query from Spark/Databricks (egress fees kill you), (b) you’re paying >$500/month for Snowflake storage, (c) you want single copy of truth across multiple engines.

→ Don’t migrate if: you only query from Snowflake, storage bill is small, and you’re not building a multi-engine architecture.

→ New gotcha: You can’t upgrade v2 tables in-place to v3. No writing to v3 tables via external engines (Spark) yet. External engine compaction gets billed starting May 21, 2026.

→ Real win: Snowflake Storage for Iceberg (GA April 2026) means you don’t manage S3 buckets. Snowflake handles it, with Fail-safe recovery built in.

→ The “data tax” of $2M–$7M annually on Fortune 500 costs more than Iceberg migration ever will.

The mental model that’s keeping you locked in

Here’s the picture most teams hold: Snowflake stores data. We query it in Snowflake. Done. Native tables, simple syntax, life is easy. And if you only query in Snowflake, that model works fine. You get the speed, the simplicity, the integration with dbt, the Time Travel.

But the moment you have data living in two systems — Snowflake for reporting, Spark for ML training, Databricks for a BI tool, even just a DuckDB instance on your laptop — you’ve broken the simple model. Now you have two copies of the data, or worse, a pipeline that’s constantly syncing between them. You’re paying Snowflake egress fees to get data out ($0.02 per GB across regions, $0.08 between clouds). You’re rebuilding the same transformation logic in both systems. You’re managing schema evolution in two places. The complexity compounds.

Iceberg was built to solve exactly this. One copy of the data, on open cloud storage (S3, Azure Blob, GCS), readable by any engine that supports the Iceberg format. Snowflake, Spark, Databricks, Trino, DuckDB. All of them see the same table, the same schema, the same snapshot. No replication, no egress fees, no syncing.

But Iceberg isn’t free. It trades simplicity for flexibility. And for teams that genuinely don’t need that flexibility, native tables are still the right call.

Iceberg hidden tax x class=

The hidden cost of locking data into proprietary formats. For large teams, it’s massive.

What changed in Iceberg v3, and why it matters

Iceberg v2 shipped in 2023 and covered the basics: open format, ACID transactions, schema evolution, snapshots. v3 (released June 2025, GA on Snowflake May 7, 2026) added seven new capabilities. Only three actually change how you’d use it.

Deletion vectors. In v2, if you deleted or updated a row, Iceberg had to rewrite the entire data file (copy-on-write). Slow and expensive. v3 adds deletion vectors — a separate, small metadata file that marks rows as deleted without touching the original data. Result: 10x faster DML operations on large tables. If you’re doing frequent small updates (common in streaming ingestion), v3 matters.

Row lineage. v3 tracks which rows were inserted, updated, or deleted with metadata fields (_row_id, _last_updated_sequence_number). This is how Snowflake implements change data capture (CDC) without external tooling. A Dynamic Iceberg Table can now refresh incrementally on only the rows that changed, not the whole partition. Critical for SCD2 and CDC pipelines.

VARIANT type. v2 forced you to choose: store JSON as a string (slow parsing at query time) or explode it into a wide schema (thousands of nullable columns, query disasters). v3 adds native VARIANT support, and Snowflake automatically shreds it (extracts nested fields and indexes them) at write time. Query performance on semi-structured data jumps dramatically. This alone is why observability platforms are betting on Iceberg.

The other four (default column values, geometry/geography types, nanosecond timestamps, partition transform improvements) are niche. Don’t worry about them unless you hit them.

The cost math: Native vs Iceberg in real dollars

Let’s be honest: most articles skip the cost comparison and jump to “Iceberg is cheaper!” It usually isn’t, month-to-month. Here’s why.

Two-column cost breakdown. Native Snowflake: 2,000 credits at $3 = $6,000 compute, $23/TB storage = $230, total $6,230/month. Iceberg (Snowflake managed): same $6,000 compute, S3 at $0.023/GB = $235 storage, bundled compaction = $0, total $6,235/month. Verdict: same cost, but Iceberg enables multi-engine and zero egress.

The real numbers. On a month-to-month basis, they’re nearly identical. The wins come from elsewhere.

For a typical 10 TB table with 1,000 queries per month (small-to-medium workload):

Native Snowflake: Compute 2,000 credits ($6,000) + Snowflake storage 10TB at $23/TB ($230) = $6,230/month.

Iceberg (Snowflake-managed storage, GA April 2026): Compute 2,000 credits ($6,000) + S3 storage 10TB (10,240 GB × $0.023/GB = $235) + compaction bundled ($0) = $6,235/month.

Basically the same. Where Iceberg wins is not in monthly costs. It wins in:

Egress fees. If you query that 10 TB table from a Databricks cluster once a month, native Snowflake costs 10,000 GB × $0.08/GB (cross-cloud egress) = $800. Iceberg: $0. Over a year, that’s $9,600. At any real-world scale (multi-engine queries), egress dominates.

No data duplication. If you’re currently syncing data between Snowflake and Databricks (ETL pipeline, manual export, Fivetran), that pipeline costs money too. Shared Iceberg table means you stop paying to move the data. One table, multiple readers.

Storage simplicity. With Snowflake Storage for Iceberg (new, April 2026), you don’t manage S3 buckets yourself. Snowflake handles encryption, replication, Fail-safe recovery. You save the operational tax of bucket management, lifecycle policies, and debugging storage issues.

So here’s the honest scorecard:

For Snowflake-only users: Native tables win. Simpler, no migration pain, costs are identical.

For multi-engine shops (Snowflake + Spark + Databricks): Iceberg wins. Egress fees alone justify the migration, and you get single source of truth as a bonus.

The gotchas that will hurt your migration

You can’t upgrade v2 tables to v3 in-place. There’s no ALTER TABLE ... SET ICEBERG_VERSION = 3. To get v3, you have to CREATE a new table. That means copying data (compute cost, time), repointing your queries, and hoping nothing breaks downstream. On large tables, this is a multi-day operation.

External engines can’t write v3 tables yet. You can read v3 tables from Spark, Trino, DuckDB, all day. But writing is blocked. Snowflake says it’s “planned,” but if you’re building a shared Iceberg table that Spark needs to update, you’re stuck on v2. This is a major limitation if you’re counting on true multi-engine write access.

Compaction gets billed starting May 21, 2026. When an external engine writes to an Iceberg table (via Spark, Trino, etc.), it creates small data files. Snowflake’s compaction automatically consolidates them into bigger files for query performance. Until May 21, that was free. Now it costs credits. Budget for ongoing compaction maintenance if you have heavy external write workloads.

⚠️ Don’t convert cloned tables with vended credentials. If you clone a native Snowflake table and then convert it to Iceberg, you can’t write to it with vended credentials (external query engine creds). You’d have to connect the external engine directly to your S3 bucket, defeating the whole point. Create the Iceberg table fresh if you’re using vended creds.

Schema changes are cheap but metadata bloat is real. Iceberg tracks every schema change as a separate metadata version. On tables with thousands of ALTER COLUMN operations, metadata can get unwieldy. Compact your metadata regularly with CALL SYSTEM$OPTIMIZE(...).

The mistakes teams make when migrating

1. Migrating for the wrong reason. “Everyone’s talking about Iceberg, so we should move.” Wrong. Migrate only if you have a concrete use case: egress fees, multi-engine queries, or storage cost >$500/month. Otherwise you’re trading simplicity for nothing.

2. Not testing external engine read performance first. Iceberg’s query performance depends heavily on your cloud setup, partitioning strategy, and how many small files are sitting around. Test Spark/Databricks queries on a small Iceberg table before migrating your 100 TB production table. You might find that your workload is slower on Iceberg, not faster.

3. Assuming v3 is backward-compatible with v2. It’s not. Engines that only understand v2 (like older Spark runtimes, Trino versions) will fail on v3 tables. Check that every tool in your stack supports v3 *before* upgrading. v2 → v3 is one-way; there’s no downgrade.

4. Ignoring the partition evolution story. Iceberg lets you change your partitioning scheme without rewriting the whole table. It’s a huge feature, but it’s also easy to mess up. Bad partitioning (e.g., partitioning by a column with 10 million distinct values) creates a partition explosion. Get your partitioning right before you migrate, not after.

5. Migrating everything at once. Pick one critical table, migrate it, test multi-engine queries for a month, then move the rest. Iceberg is mature enough for production, but it’s not old enough that every edge case is documented. Be intentional.

When to actually migrate: The real decision

Stop and ask yourself: Do you actually need Iceberg?

Yes, if: You query the same data from Snowflake and Spark/Databricks. You’re paying egress fees. You have data warehouses in multiple clouds and want to query across them. You’re building a data lakehouse and want to ditch proprietary formats.

No, if: You only query from Snowflake. Your storage bill is <$500/month. You’re using Snowflake’s Time Travel, zero-copy clones, and other native features heavily. You don’t need to share data with other engines.

For most teams, the answer is no. And that’s okay. Native Snowflake tables are extremely good. Simple, fast, well-integrated with dbt. There’s no shame in staying native.

But for teams hitting the “data tax” — redundant copies, egress fees, multi-engine complexity — Iceberg v3 actually delivers. The gotchas are real, but they’re manageable. The cost savings are modest month-to-month, but the flexibility is transformative.

The one principle that matters

Interoperability beats simplicity when you’re already paying for fragmentation. If your current architecture already costs you $800/month in egress, $300/month in ETL pipelines, and engineering time chasing sync issues, Iceberg’s “complexity” is actually a simplification. You’re not adding complexity; you’re replacing it with a standard.

If you’re simple and integrated today, stay there. Don’t pay the cost of flexibility you don’t need. But if you’re paying the data tax, Iceberg’s math changes fast.

Related reading: Snowflake Apache Iceberg tables (official docs) · Snowflake Time Travel: The Real Architecture · Snowflake Optima: 15x Faster Queries at Zero Cost · Query Snowflake in DuckDB and Cut Costs