Data Modeling 101
Why one big table is never enough — and why the shape of your data should follow how you use it, not how you collect it.
Two worlds, two opposite goals. Writing fast and reading fast want different shapes.
- Operational (normalized) — built to WRITE: many small tables, no duplication, fast and consistent updates. Your app's database.
- Analytical (denormalized) — built to READ: fewer, wider tables, deliberate duplication, fast aggregations. Your warehouse.
- The core idea — the shape of data should follow how you USE it.
- Why two models — fast writes and fast reads want opposite structures.
- So we copy operational data into a model designed for analysis.
Two Jobs, Two Shapes
A data model is just a decision about how to structure tables and the relationships between them. The right structure depends entirely on the job. There are two very different jobs, and they pull in opposite directions.
An operational system — the database behind an app — is optimized to write: insert an order, update a balance, do it thousands of times a second without conflicts or duplicated facts. An analytical system — the warehouse — is optimized to read: scan billions of rows and aggregate them into a chart. Forcing one shape to do both jobs makes it bad at both.
Normalized (3NF) — for writes
- Each fact stored exactly once
- Many small tables, joined by keys
- Updates are cheap and safe
- No duplication, no contradictions
- Great for transactions, bad for big scans
Denormalized — for reads
- Deliberate duplication to avoid joins
- Fewer, wider tables
- Aggregations are fast
- Updates are heavier (that's fine — analytics rarely updates)
- Great for dashboards and ML features
The Concepts Behind It
1. Normalization — Eliminating Redundancy
Normalization splits data so every fact lives in exactly one place. Store a customer's address once, and reference it by key everywhere else. Change the address in one row and it's correct everywhere — no risk of three tables disagreeing. This is what keeps operational systems consistent under heavy writes.
2. Denormalization — Trading Space for Speed
Analytics flips the priority. Joins across many normalized tables are expensive when you're scanning billions of rows. So we deliberately duplicate data into wider tables, trading cheap storage for fast reads. Storage is cheap (Part 2); analyst time and query latency are not.
3. The Conceptual → Logical → Physical Ladder
Modeling moves through three levels: conceptual (the business entities — customers, orders, products), logical (tables, columns, keys, relationships, independent of any platform), and physical (the actual DDL, types, partitioning on a specific engine). Good models are designed at the logical level first — which is exactly why they transfer across platforms.
Same Data, Two Shapes
The same business facts, reshaped from many write-friendly tables into a read-friendly model.
Same Pattern, Every Platform
Modeling is designed at the logical level, so it's almost entirely platform-independent. Only the physical DDL changes:
| The Concept | Snowflake | Databricks | BigQuery | Microsoft Fabric |
|---|---|---|---|---|
| Logical model | Identical — entities, keys, relationships are platform-agnostic | |||
| Build the model | SQL / dbt | SQL / dbt / Spark | SQL / dbt | SQL / Dataflows Gen2 / Notebooks |
| Semantic layer | dbt Semantic Layer | Unity Catalog metrics / dbt | Looker / dbt | Power BI semantic model |
Feature names evolve — treat this as a capability map, and confirm specifics against current vendor docs.
The takeaway: data modeling is the most transferable skill in this whole series. A good logical model — the right entities, keys, and grain — is correct on every platform. Learn to model, and you're not learning Snowflake or Fabric; you're learning something that outlives both.