← Back to Series Overview
Learn the Pattern · Part 1

Anatomy of a Data Platform

Snowflake, Databricks, BigQuery, Microsoft Fabric — strip away the branding and every data platform is the same six layers. Learn them once, and any platform becomes navigable.

In 60 seconds

A data platform — no vendor logos, just the pattern. Every one of them is these six layers, stacked:

  1. Sources — where data is born: APIs, databases, files, event streams.
  2. Ingestion — how it gets in: connectors, batch loads, and streaming.
  3. Storage — where it rests: cheap, open, and decoupled from compute.
  4. Compute — the engines that process it: elastic, switched on only when needed.
  5. Serving — how people use it: SQL, BI dashboards, ML, and APIs.
  6. Governance — the wrapper around it all: security, catalog, and lineage.

Why This Map Matters

The fastest way to feel lost in this field is to learn a platform as a list of buttons. Buttons move, get renamed, and get replaced — and your knowledge expires with them. The fastest way to feel at home in any platform is to learn the six-layer map below, then ask one question whenever you meet a new tool: "Which layer is this, and what pattern is it implementing?"

Almost every feature in Snowflake, Databricks, BigQuery, or Microsoft Fabric slots cleanly into one of these six layers. Once you see that, a new platform stops being a wall of unfamiliar menus and becomes a familiar house with the furniture rearranged.

The Six Layers, Explained

1. Sources — Where Data Is Born

Operational databases (the ones your apps write to), third-party APIs, flat files (CSV, JSON, Parquet), and event streams (clickstreams, IoT sensors, logs). The platform doesn't own this layer — it connects to it. This layer is essentially identical across every platform, because it lives outside the platform.

2. Ingestion — How Data Gets In

The bridge from sources into the platform. Two modes you'll see everywhere: batch (load a chunk on a schedule) and streaming (capture events continuously). Every platform ships managed connectors and pipeline tools so you write less custom plumbing — the names differ, the job is the same.

3 & 4. Storage and Compute — Deliberately Separated

This is the single most important architectural idea in the modern stack, so it gets its own post (Part 2). The short version: data sits in cheap object storage, and compute engines are a separate thing you spin up only to run a query, then shut down. Storage is cheap and shared; compute is elastic and metered. That separation is why modern platforms scale smoothly and bill by usage instead of hardware.

5. Serving — How People Actually Use It

The consumption layer: SQL endpoints for analysts, BI tools (Power BI, Tableau, Looker) for dashboards, feature/training access for ML, and APIs for applications. This is where the platform finally turns into business value — everything below it exists to make this layer fast and trustworthy.

6. Governance — The Wrapper Around Everything

Not a step in the flow but a layer that wraps all the others: who can access what (security), what each dataset means (catalog), and where every number came from (lineage). Skip it and a data platform quietly becomes a "data swamp" nobody trusts. Part 12 closes the series here.

The Six Layers, Visualized

Notice how storage and compute are split, and how governance wraps the entire stack rather than sitting in the pipeline:

The 6 Layers — data flowing through the pipeline 🛡️ Governance — security · catalog · lineage (wraps every layer) 📦 1 · Sources APIs · DBs · Events 🔌 2 · Ingestion Batch & Streaming 🗄️ 3 · Storage Cheap · Open ⚙️ 4 · Compute Elastic engines 📊 5 · Serving SQL · BI · ML

Solid arrows = data flow  |  Dashed arrows = governance wrapping every layer.

Same Pattern, Every Platform

Here is the whole point of this series in one table. The layers don't change — only the labels do:

The Layer Snowflake Databricks BigQuery Microsoft Fabric
Ingestion Snowpipe · connectors Auto Loader · Delta Live Tables Dataflow · Storage Write API Data Factory pipelines · Dataflows Gen2
Storage Managed / Iceberg tables Delta Lake on cloud storage BigQuery storage · BigLake OneLake
Compute Virtual Warehouses Clusters · SQL Warehouses Slots (on-demand / reservations) Capacities (F SKUs)
Serving Snowsight · SQL · Streamlit SQL · Notebooks · Model Serving SQL · BI Engine Power BI · SQL endpoint · Direct Lake
Governance Horizon Catalog Unity Catalog Dataplex · IAM Microsoft Purview

Feature names evolve — treat this as a capability map, and confirm specifics against current vendor docs before relying on them.

The takeaway: you are not learning four platforms — you are learning one six-layer pattern that four platforms implement. Every concept in this series lives in one of these layers. When you meet platform number five, you'll already know its anatomy.

← Back to Publications