Part 8 — Open Table Formats

Learn the Pattern · Part 8

Open Table Formats

Delta, Iceberg, and Hudi — the metadata layer that turns a pile of files into a real table, and quietly ends vendor lock-in.

In 60 seconds

A pile of Parquet files in object storage isn't a table. Open table formats make it behave like one.

The problem — raw files can't do transactions, can't time-travel, can't safely handle many writers.
The fix — add a metadata layer on top of the files so they act like a real table.
ACID transactions — no half-written messes.
Time travel + schema evolution — query yesterday's version; add a column without rewriting everything.
One copy, many engines — and that's what ends vendor lock-in.

Files Are Not a Table

The lakehouse (Part 7) stores data as open columnar files — usually Parquet — in cheap object storage. But a folder full of Parquet files is just that: files. Try to update one row, run two jobs that write at once, or ask "what did this table look like last Tuesday?" and you hit a wall. There's no transaction log, no schema authority, no notion of a consistent table state.

Open table formats — Delta Lake, Apache Iceberg, and Apache Hudi — add a metadata layer on top of those files. That layer tracks which files belong to the table, at which version, with which schema. Suddenly the pile of files behaves like a proper database table — while staying open and engine-agnostic.

What the Metadata Layer Buys You

1. ACID Transactions

Writes become atomic: a job either fully commits or doesn't change the table at all. No more readers seeing half-written data, no more corrupt states from two jobs colliding. This is the single biggest thing that separates a "table" from "some files."

2. Time Travel

Because each commit creates a new version, you can query the table as of a past version or timestamp. Invaluable for auditing, reproducing an ML training set, or recovering from a bad write — just roll back to the previous version.

3. Schema Evolution

Add, rename, or reorder columns without rewriting petabytes of files. The metadata layer tracks schema changes over time, so pipelines don't shatter the moment an upstream source adds a field.

4. The Real Prize: No Lock-In

Here's why this topic is suddenly everywhere. Because the format is open and the data sits in your own storage, multiple engines can read and write the same table. Your data is no longer trapped inside one vendor's proprietary format — Snowflake, Databricks, BigQuery, and Fabric can all operate on the same Iceberg or Delta table. The convergence on open formats (and interoperability layers between them) is the industry deciding that data should outlive any single engine.

The Layers, Visualized

The metadata layer sits between the engines and the raw files — letting many engines share one table.

Same Pattern, Every Platform

The Pattern	Snowflake	Databricks	BigQuery	Microsoft Fabric
Native format	Apache Iceberg	Delta Lake (also Iceberg)	Iceberg via BigLake	Delta Lake (OneLake)
ACID + time travel	Yes	Yes	Yes	Yes
Cross-engine reads	Iceberg tables	Delta / Uniform (Iceberg)	BigLake / external tables	OneLake shortcuts

Feature names and interoperability options evolve quickly here — confirm current support (e.g., Delta⇄Iceberg interop) against vendor docs.

The takeaway: the format is the pattern; the engine is the product. Open table formats put a real table on top of open files, so your data stays portable across Snowflake, Databricks, BigQuery, and Fabric. This is the most concrete expression of "learn the pattern, not the product" in the whole stack.

← Part 7 Lake vs Warehouse vs Lakehouse Part 9 → Partitioning & Clustering

← Back to Publications