TL;DR
→ Delta Lake is easier to start with, especially if you’re already on Databricks
→ Iceberg wins on engine flexibility — works natively with Spark, Flink, Trino, Snowflake, and more without custom connectors
→ Delta Lake’s vendor coupling with Databricks is a real cost if you’re multi-cloud or multi-engine
→ Iceberg’s partition evolution lets you change partition schemes without rewriting data — that feature alone saved us a full weekend of migration work
→ Migration from Delta to Iceberg is harder than most blog posts suggest — budget four to eight weeks, not a weekend
→ If you’re greenfield, start with Iceberg. If Delta is working, don’t migrate until you hit a specific limit
I didn’t choose Iceberg because I read a benchmark blog post. I chose it after six months of hitting Delta Lake’s limits in ways that weren’t obvious until they were expensive.
We were running a mid-sized data lakehouse — S3-backed, Spark for processing, Snowflake for consumption, dbt for transformation. Delta Lake was the default choice. Everyone on the team had used it before. The documentation was solid. It worked — until it didn’t.
This isn’t a “here are the specs” comparison. You can get that from the docs. This is what actually happened when I ran both in production, why I made the switch, and what I’d tell you before you pick one.
What We Were Actually Trying to Solve
Before I get into the comparison, context matters. Our stack at the time: raw data landing in S3, Apache Spark for heavy transformation, Snowflake as the consumption layer for analysts, dbt for modeling, and Apache Airflow for orchestration.
We needed ACID transactions on S3, time travel for debugging, and the ability to do incremental loads without full partition rewrites. Delta Lake checked all those boxes — initially. The problems showed up at scale and at the edges.
Where Delta Lake Started Hurting Us
Engine Lock-In Was a Real Problem
Delta Lake works great if Spark is your only compute engine. The moment we tried to query Delta tables directly from Snowflake or Trino, things got complicated. Delta’s transaction log format is proprietary. You need the Delta connector — and not every engine has a first-class one.
We wanted analysts to query raw lakehouse tables directly from Snowflake without going through Spark first. With Delta, that required Snowflake’s Delta Sharing integration, which had limitations on what operations were supported. It wasn’t broken, but it added friction and another dependency to manage.
Apache Iceberg solves this cleanly. The table format is open. Snowflake, Spark, Flink, Trino, Athena, Dremio — they all read and write Iceberg natively. No connectors to manage. No format translation layer.
Partition Management Was Getting Messy
With Delta Lake, partitioning decisions are set at table creation. Changing a partition scheme means rewriting the table. At 100M+ rows, that’s not a quick operation.
We had a table partitioned by event_date. Six months in, query patterns changed — analysts were filtering by event_date and region together. Repartitioning meant a full backfill job over a weekend, plus repointing all downstream dbt models.I wrote about a similar pain point in the problem with dbt incremental models — the pattern is the same.
Iceberg’s partition evolution lets you change the partition spec without rewriting data. Old data stays as-is. New data uses the new scheme. Queries still work against both.
Hidden Partitioning Changed How We Design Tables
Iceberg supports hidden partitioning — you define partition transforms like days(event_timestamp) or bucket(user_id, 16) and Iceberg handles physical partitioning transparently. Your queries don’t need to know about partition columns. The engine prunes automatically.
With Delta Lake, you need to explicitly filter on partition columns or you’ll scan everything. That’s fine when everyone knows the rules. It’s a problem when a new analyst writes a query without knowing which columns are partition keys.
Where Delta Lake Is Still Better
If you’re on Databricks, stay on Delta. The integration is tight, the tooling is mature, and Databricks has invested heavily in Delta’s performance.. Liquid Clustering makes partition management much more flexible. If Databricks is your primary compute layer, switching to Iceberg gives you marginal benefit for non-trivial migration cost.
Delta’s MERGE performance on Spark is excellent. For high-frequency CDC workloads where you’re doing upserts at scale on Spark, Delta’s MERGE implementation is well-optimised. Iceberg’s MERGE has improved significantly but Delta still has an edge in some Spark-specific CDC patterns.
Delta has simpler operational overhead for small teams. Delta’s transaction log is easier to reason about. The tooling for vacuum, optimize, and Z-ordering is well-documented and predictable.
The Comparison You Actually Need
| Feature | Delta Lake | Apache Iceberg |
|---|---|---|
| Engine support | Spark-native; connectors for others | Truly multi-engine (Spark, Flink, Trino, Snowflake, Athena) |
| Partition evolution | Requires full table rewrite | Schema-safe, no data rewrite needed |
| Hidden partitioning | Not supported | Supported — engines auto-prune |
| MERGE / CDC performance | Excellent on Spark | Strong, improving; slightly behind Delta on Spark CDC |
| Vendor alignment | Databricks ecosystem | Vendor-neutral, Apache foundation |
| Operational tooling | Mature, well-documented | Maturing fast; strong in 2024–2025 |
| Multi-cloud flexibility | Possible but friction | First-class support across clouds |
| Migration effort | N/A (starting point) | Non-trivial; plan 4–8 weeks |
THE MIGRATION: WHAT IT ACTUALLY COST US
The Migration: What It Actually Cost Us
I’ll be direct: the migration was harder than I expected. If you’ve read my piece on automation in data engineering, you’ll recognise the pattern — the technical part is rarely the hard part. It’s the downstream work nobody accounts for.
The core work wasn’t the data conversion — we used the delta-iceberg migration utility and it handled most of the heavy lifting. The harder parts were everything else.
Downstream dependency mapping.
Every dbt model, every Airflow DAG, every Spark job that referenced a Delta table path needed updating. We had 40+ models. Two had hardcoded partition paths we didn’t catch until QA.Metadata catalog updates.
We use AWS Glue Data Catalog. Every table needed its metadata updated to reflect the Iceberg format. Glue’s Iceberg support has improved, but it’s not frictionless.Testing the rollback plan. We kept Delta tables live for 30 days post-migration with a cutover switch in Airflow. That meant double-writing during the transition window — additional storage cost and added pipeline complexity.
⚠️ The migration trap: The data conversion tooling works. What catches teams off guard is the downstream mapping work — every pipeline, model, and job that references a table path. Budget more time for that than for the actual format conversion.
Total elapsed time: six weeks. Two engineers. Not a weekend project.
When to Choose Delta Lake
- Your primary compute layer is Databricks
- You’re a small team that wants simpler operations
- You’re doing high-frequency CDC on Spark
- You’re early stage — get something working first
When to Choose Iceberg
- You’re running multiple query engines (Spark + Snowflake, Trino + Flink)
- You need partition evolution without full table rewrites
- You’re building a vendor-neutral architecture
- Your analysts query the lakehouse directly from Snowflake
What I’d Do Differently
Start with Iceberg if you’re greenfield. The setup is slightly more involved, but you avoid the migration cost entirely. The ecosystem has matured enough in 2024-2025 that “Iceberg is less mature” is no longer a strong argument.
If you’re already on Delta and it’s working — don’t migrate for the sake of it. Migrate when you hit a specific limit: engine lock-in, partition inflexibility, or multi-cloud requirements.
And if you do migrate, don’t underestimate the downstream mapping work. The data conversion is the easy part.
Frequently Asked Questions
What is the main difference between Delta Lake and Apache Iceberg?
Delta Lake is a table format developed by Databricks, optimised for Spark workloads with strong Databricks integration. Apache Iceberg is an open table format designed for multi-engine environments — it works natively with Spark, Flink, Trino, Snowflake, and Athena without custom connectors. The core difference is engine flexibility.
Is Apache Iceberg better than Delta Lake?
It depends on your stack. Iceberg is better if you’re running multiple query engines or building a vendor-neutral architecture. Delta Lake is better if Databricks is your primary compute layer. Neither format is objectively superior.
Can Snowflake read Delta Lake tables?
Yes, through Delta Sharing or Snowflake’s Delta connector — but with limitations. Snowflake reads Iceberg tables natively as a first-class citizen, which is why multi-engine stacks tend to favour Iceberg.
How hard is it to migrate from Delta Lake to Apache Iceberg?
Harder than most blog posts suggest. The data conversion tooling handles the format migration, but remapping downstream pipelines, updating metadata catalogs, and testing rollback scenarios adds significant effort. Budget four to eight weeks for a production migration with 30–50 tables.
Does dbt support Apache Iceberg?
Yes. dbt supports Iceberg through the Spark and Athena adapters, and Snowflake’s Iceberg table support works with dbt models running on Snowflake. Production-ready as of 2024.
What is hidden partitioning in Apache Iceberg?
Hidden partitioning lets Iceberg manage partition logic transparently. You define partition transforms like days(event_timestamp) at the table level, and Iceberg handles physical file organisation and query pruning automatically — no need to filter on partition columns explicitly.