Part 5 — Data Modeling 101

Learn the Pattern · Part 5

Data Modeling 101

Why one big table is never enough — and why the shape of your data should follow how you use it, not how you collect it.

In 60 seconds

Two worlds, two opposite goals. Writing fast and reading fast want different shapes.

Operational (normalized) — built to WRITE: many small tables, no duplication, fast and consistent updates. Your app's database.
Analytical (denormalized) — built to READ: fewer, wider tables, deliberate duplication, fast aggregations. Your warehouse.
The core idea — the shape of data should follow how you USE it.
Why two models — fast writes and fast reads want opposite structures.
So we copy operational data into a model designed for analysis.

Two Jobs, Two Shapes

A data model is just a decision about how to structure tables and the relationships between them. The right structure depends entirely on the job. There are two very different jobs, and they pull in opposite directions.

An operational system — the database behind an app — is optimized to write: insert an order, update a balance, do it thousands of times a second without conflicts or duplicated facts. An analytical system — the warehouse — is optimized to read: scan billions of rows and aggregate them into a chart. Forcing one shape to do both jobs makes it bad at both.

Normalized (3NF) — for writes

Each fact stored exactly once
Many small tables, joined by keys
Updates are cheap and safe
No duplication, no contradictions
Great for transactions, bad for big scans

Denormalized — for reads

Deliberate duplication to avoid joins
Fewer, wider tables
Aggregations are fast
Updates are heavier (that's fine — analytics rarely updates)
Great for dashboards and ML features

The Concepts Behind It

1. Normalization — Eliminating Redundancy

Normalization splits data so every fact lives in exactly one place. Store a customer's address once, and reference it by key everywhere else. Change the address in one row and it's correct everywhere — no risk of three tables disagreeing. This is what keeps operational systems consistent under heavy writes.

2. Denormalization — Trading Space for Speed

Analytics flips the priority. Joins across many normalized tables are expensive when you're scanning billions of rows. So we deliberately duplicate data into wider tables, trading cheap storage for fast reads. Storage is cheap (Part 2); analyst time and query latency are not.

3. The Conceptual → Logical → Physical Ladder

Modeling moves through three levels: conceptual (the business entities — customers, orders, products), logical (tables, columns, keys, relationships, independent of any platform), and physical (the actual DDL, types, partitioning on a specific engine). Good models are designed at the logical level first — which is exactly why they transfer across platforms.

Same Data, Two Shapes

The same business facts, reshaped from many write-friendly tables into a read-friendly model.

Same Pattern, Every Platform

Modeling is designed at the logical level, so it's almost entirely platform-independent. Only the physical DDL changes:

The Concept	Snowflake	Databricks	BigQuery	Microsoft Fabric
Logical model	Identical — entities, keys, relationships are platform-agnostic
Build the model	SQL / dbt	SQL / dbt / Spark	SQL / dbt	SQL / Dataflows Gen2 / Notebooks
Semantic layer	dbt Semantic Layer	Unity Catalog metrics / dbt	Looker / dbt	Power BI semantic model

Feature names evolve — treat this as a capability map, and confirm specifics against current vendor docs.

The takeaway: data modeling is the most transferable skill in this whole series. A good logical model — the right entities, keys, and grain — is correct on every platform. Learn to model, and you're not learning Snowflake or Fabric; you're learning something that outlives both.

← Part 4 Batch vs Streaming Part 6 → Dimensional Modeling & the Star Schema

← Back to Publications