← Back to Series Overview
Learn the Pattern · Part 5

Data Modeling 101

Why one big table is never enough — and why the shape of your data should follow how you use it, not how you collect it.

In 60 seconds

Two worlds, two opposite goals. Writing fast and reading fast want different shapes.

  1. Operational (normalized) — built to WRITE: many small tables, no duplication, fast and consistent updates. Your app's database.
  2. Analytical (denormalized) — built to READ: fewer, wider tables, deliberate duplication, fast aggregations. Your warehouse.
  3. The core idea — the shape of data should follow how you USE it.
  4. Why two models — fast writes and fast reads want opposite structures.
  5. So we copy operational data into a model designed for analysis.

Two Jobs, Two Shapes

A data model is just a decision about how to structure tables and the relationships between them. The right structure depends entirely on the job. There are two very different jobs, and they pull in opposite directions.

An operational system — the database behind an app — is optimized to write: insert an order, update a balance, do it thousands of times a second without conflicts or duplicated facts. An analytical system — the warehouse — is optimized to read: scan billions of rows and aggregate them into a chart. Forcing one shape to do both jobs makes it bad at both.

Normalized (3NF) — for writes
  • Each fact stored exactly once
  • Many small tables, joined by keys
  • Updates are cheap and safe
  • No duplication, no contradictions
  • Great for transactions, bad for big scans
Denormalized — for reads
  • Deliberate duplication to avoid joins
  • Fewer, wider tables
  • Aggregations are fast
  • Updates are heavier (that's fine — analytics rarely updates)
  • Great for dashboards and ML features

The Concepts Behind It

1. Normalization — Eliminating Redundancy

Normalization splits data so every fact lives in exactly one place. Store a customer's address once, and reference it by key everywhere else. Change the address in one row and it's correct everywhere — no risk of three tables disagreeing. This is what keeps operational systems consistent under heavy writes.

2. Denormalization — Trading Space for Speed

Analytics flips the priority. Joins across many normalized tables are expensive when you're scanning billions of rows. So we deliberately duplicate data into wider tables, trading cheap storage for fast reads. Storage is cheap (Part 2); analyst time and query latency are not.

3. The Conceptual → Logical → Physical Ladder

Modeling moves through three levels: conceptual (the business entities — customers, orders, products), logical (tables, columns, keys, relationships, independent of any platform), and physical (the actual DDL, types, partitioning on a specific engine). Good models are designed at the logical level first — which is exactly why they transfer across platforms.

Same Data, Two Shapes

Same facts, reshaped: normalized → analytical 🖊️ Operational — normalized (writes) orders customers products ⚙️ ELT / pipeline 📊 Analytical — denormalized (reads) 🧾 wide sales table order + customer + product in one row

The same business facts, reshaped from many write-friendly tables into a read-friendly model.

Same Pattern, Every Platform

Modeling is designed at the logical level, so it's almost entirely platform-independent. Only the physical DDL changes:

The ConceptSnowflakeDatabricksBigQueryMicrosoft Fabric
Logical modelIdentical — entities, keys, relationships are platform-agnostic
Build the modelSQL / dbtSQL / dbt / SparkSQL / dbtSQL / Dataflows Gen2 / Notebooks
Semantic layerdbt Semantic LayerUnity Catalog metrics / dbtLooker / dbtPower BI semantic model

Feature names evolve — treat this as a capability map, and confirm specifics against current vendor docs.

The takeaway: data modeling is the most transferable skill in this whole series. A good logical model — the right entities, keys, and grain — is correct on every platform. Learn to model, and you're not learning Snowflake or Fabric; you're learning something that outlives both.

← Back to Publications