Storage vs Compute: The Great Divorce
The single architectural decision that makes the modern data stack cheap to start, easy to scale, and billed by usage instead of hardware.
Why is the modern data stack so elastic? Because storage and compute stopped living in the same box.
- The old way — storage and compute were welded together in one server. More query power meant buying a bigger box.
- The divorce — data now sits in cheap object storage; compute is a separate engine you rent by the second.
- Spin up on demand — start an engine to run a query, shut it down after. Idle compute costs nothing.
- Many engines, one copy — ten teams can query the same dataset at once without copying it.
- Scale each side independently — store petabytes cheaply; burst compute for five minutes, then scale back.
Why the Two Got Separated
In a traditional database, the disks and the CPUs lived in the same machine. That coupling created two painful problems: you couldn't add query power without also paying for more storage you didn't need, and you couldn't store more data without buying more compute you weren't using. You sized for your busiest moment and paid for it 24/7.
Cloud object storage (Amazon S3, Azure Data Lake Storage, Google Cloud Storage, and Microsoft's OneLake) broke the coupling. Storage became a cheap, near-infinite, always-on commodity. Compute became a fleet of stateless engines that attach to that storage only when there's work to do. This single idea — decoupled storage and compute — is the foundation every modern platform is built on.
What This Unlocks
1. Elastic, Pay-Per-Use Economics
Because compute is metered separately, your bill tracks actual usage. A warehouse can auto-suspend when idle and auto-resume on the next query. You can run a massive engine for a five-minute month-end job and a tiny one for ad-hoc queries — on the same data, without moving it.
2. Workload Isolation
Different teams can attach their own independent compute to the same shared storage. The data science team's heavy training job no longer slows down the finance team's dashboard, because they run on separate engines reading the same files. No copies, no contention, no "who's hogging the database?"
3. Independent Scaling
Storage and compute grow on separate dials. Keep ten years of history in cheap storage while running modest compute, or pair a small dataset with a huge engine for a complex transformation. You size each side to its own need instead of compromising on one box.
How It Looks
One shared storage layer; multiple independent compute engines attaching to it on demand:
One copy of the data, many engines — each scaling and billing independently.
Same Pattern, Every Platform
Every platform names its compute differently, but they all attach elastic engines to decoupled storage:
| The Pattern | Snowflake | Databricks | BigQuery | Microsoft Fabric |
|---|---|---|---|---|
| Storage layer | Managed storage (cloud object store) | Delta Lake on S3/ADLS/GCS | BigQuery storage · BigLake | OneLake |
| Compute unit | Virtual Warehouse | Cluster · SQL Warehouse | Slots | Capacity (F SKU) |
| Idle cost control | Auto-suspend / resume | Auto-termination | On-demand (pay per query) or reservations | Pause / resume capacity |
| Workload isolation | Separate warehouses | Separate clusters | Reservations / assignments | Separate capacities / workspaces |
Feature names evolve — treat this as a capability map, and confirm specifics against current vendor docs.
The takeaway: when a new platform shows you "warehouses," "clusters," "slots," or "capacities," it's the same pattern — elastic compute renting a shared, decoupled storage layer. Learn the divorce once and the pricing model of every platform suddenly makes sense.