Anatomy of a Data Platform
Snowflake, Databricks, BigQuery, Microsoft Fabric — strip away the branding and every data platform is the same six layers. Learn them once, and any platform becomes navigable.
A data platform — no vendor logos, just the pattern. Every one of them is these six layers, stacked:
- Sources — where data is born: APIs, databases, files, event streams.
- Ingestion — how it gets in: connectors, batch loads, and streaming.
- Storage — where it rests: cheap, open, and decoupled from compute.
- Compute — the engines that process it: elastic, switched on only when needed.
- Serving — how people use it: SQL, BI dashboards, ML, and APIs.
- Governance — the wrapper around it all: security, catalog, and lineage.
Why This Map Matters
The fastest way to feel lost in this field is to learn a platform as a list of buttons. Buttons move, get renamed, and get replaced — and your knowledge expires with them. The fastest way to feel at home in any platform is to learn the six-layer map below, then ask one question whenever you meet a new tool: "Which layer is this, and what pattern is it implementing?"
Almost every feature in Snowflake, Databricks, BigQuery, or Microsoft Fabric slots cleanly into one of these six layers. Once you see that, a new platform stops being a wall of unfamiliar menus and becomes a familiar house with the furniture rearranged.
The Six Layers, Explained
1. Sources — Where Data Is Born
Operational databases (the ones your apps write to), third-party APIs, flat files (CSV, JSON, Parquet), and event streams (clickstreams, IoT sensors, logs). The platform doesn't own this layer — it connects to it. This layer is essentially identical across every platform, because it lives outside the platform.
2. Ingestion — How Data Gets In
The bridge from sources into the platform. Two modes you'll see everywhere: batch (load a chunk on a schedule) and streaming (capture events continuously). Every platform ships managed connectors and pipeline tools so you write less custom plumbing — the names differ, the job is the same.
3 & 4. Storage and Compute — Deliberately Separated
This is the single most important architectural idea in the modern stack, so it gets its own post (Part 2). The short version: data sits in cheap object storage, and compute engines are a separate thing you spin up only to run a query, then shut down. Storage is cheap and shared; compute is elastic and metered. That separation is why modern platforms scale smoothly and bill by usage instead of hardware.
5. Serving — How People Actually Use It
The consumption layer: SQL endpoints for analysts, BI tools (Power BI, Tableau, Looker) for dashboards, feature/training access for ML, and APIs for applications. This is where the platform finally turns into business value — everything below it exists to make this layer fast and trustworthy.
6. Governance — The Wrapper Around Everything
Not a step in the flow but a layer that wraps all the others: who can access what (security), what each dataset means (catalog), and where every number came from (lineage). Skip it and a data platform quietly becomes a "data swamp" nobody trusts. Part 12 closes the series here.
The Six Layers, Visualized
Notice how storage and compute are split, and how governance wraps the entire stack rather than sitting in the pipeline:
Solid arrows = data flow | Dashed arrows = governance wrapping every layer.
Same Pattern, Every Platform
Here is the whole point of this series in one table. The layers don't change — only the labels do:
| The Layer | Snowflake | Databricks | BigQuery | Microsoft Fabric |
|---|---|---|---|---|
| Ingestion | Snowpipe · connectors | Auto Loader · Delta Live Tables | Dataflow · Storage Write API | Data Factory pipelines · Dataflows Gen2 |
| Storage | Managed / Iceberg tables | Delta Lake on cloud storage | BigQuery storage · BigLake | OneLake |
| Compute | Virtual Warehouses | Clusters · SQL Warehouses | Slots (on-demand / reservations) | Capacities (F SKUs) |
| Serving | Snowsight · SQL · Streamlit | SQL · Notebooks · Model Serving | SQL · BI Engine | Power BI · SQL endpoint · Direct Lake |
| Governance | Horizon Catalog | Unity Catalog | Dataplex · IAM | Microsoft Purview |
Feature names evolve — treat this as a capability map, and confirm specifics against current vendor docs before relying on them.
The takeaway: you are not learning four platforms — you are learning one six-layer pattern that four platforms implement. Every concept in this series lives in one of these layers. When you meet platform number five, you'll already know its anatomy.