Part 2 — Storage vs Compute

Learn the Pattern · Part 2

Storage vs Compute: The Great Divorce

The single architectural decision that makes the modern data stack cheap to start, easy to scale, and billed by usage instead of hardware.

In 60 seconds

Why is the modern data stack so elastic? Because storage and compute stopped living in the same box.

The old way — storage and compute were welded together in one server. More query power meant buying a bigger box.
The divorce — data now sits in cheap object storage; compute is a separate engine you rent by the second.
Spin up on demand — start an engine to run a query, shut it down after. Idle compute costs nothing.
Many engines, one copy — ten teams can query the same dataset at once without copying it.
Scale each side independently — store petabytes cheaply; burst compute for five minutes, then scale back.

Why the Two Got Separated

In a traditional database, the disks and the CPUs lived in the same machine. That coupling created two painful problems: you couldn't add query power without also paying for more storage you didn't need, and you couldn't store more data without buying more compute you weren't using. You sized for your busiest moment and paid for it 24/7.

Cloud object storage (Amazon S3, Azure Data Lake Storage, Google Cloud Storage, and Microsoft's OneLake) broke the coupling. Storage became a cheap, near-infinite, always-on commodity. Compute became a fleet of stateless engines that attach to that storage only when there's work to do. This single idea — decoupled storage and compute — is the foundation every modern platform is built on.

What This Unlocks

1. Elastic, Pay-Per-Use Economics

Because compute is metered separately, your bill tracks actual usage. A warehouse can auto-suspend when idle and auto-resume on the next query. You can run a massive engine for a five-minute month-end job and a tiny one for ad-hoc queries — on the same data, without moving it.

2. Workload Isolation

Different teams can attach their own independent compute to the same shared storage. The data science team's heavy training job no longer slows down the finance team's dashboard, because they run on separate engines reading the same files. No copies, no contention, no "who's hogging the database?"

3. Independent Scaling

Storage and compute grow on separate dials. Keep ten years of history in cheap storage while running modest compute, or pair a small dataset with a huge engine for a complex transformation. You size each side to its own need instead of compromising on one box.

How It Looks

One shared storage layer; multiple independent compute engines attaching to it on demand:

One copy of the data, many engines — each scaling and billing independently.

Same Pattern, Every Platform

Every platform names its compute differently, but they all attach elastic engines to decoupled storage:

The Pattern	Snowflake	Databricks	BigQuery	Microsoft Fabric
Storage layer	Managed storage (cloud object store)	Delta Lake on S3/ADLS/GCS	BigQuery storage · BigLake	OneLake
Compute unit	Virtual Warehouse	Cluster · SQL Warehouse	Slots	Capacity (F SKU)
Idle cost control	Auto-suspend / resume	Auto-termination	On-demand (pay per query) or reservations	Pause / resume capacity
Workload isolation	Separate warehouses	Separate clusters	Reservations / assignments	Separate capacities / workspaces

Feature names evolve — treat this as a capability map, and confirm specifics against current vendor docs.

The takeaway: when a new platform shows you "warehouses," "clusters," "slots," or "capacities," it's the same pattern — elastic compute renting a shared, decoupled storage layer. Learn the divorce once and the pricing model of every platform suddenly makes sense.

← Part 1 Anatomy of a Data Platform Part 3 → ETL vs ELT: Where Transformation Lives

← Back to Publications