Comparing
A Number Only Means Something Next to Another Number
"$2.4M in sales" is just a figure until you ask: compared to what? Last year? The target? The other region? Comparison is the step that turns measurements into judgements. It evaluates performance across dimensions — time periods, categories, segments, geographies — to expose what is growing, what is lagging, and what is simply different. Almost every business insight is, at heart, a comparison.
Comparison aligns two like-for-like figures and reports the difference that matters.
Common Types of Comparison
- Period-over-period: this month vs last month, or year-over-year (YoY) to strip out seasonality.
- Category vs category: which product line, region, or segment outperforms the others.
- Actual vs target: performance against a plan, budget, or benchmark.
- Cohort vs cohort: the segments built in the previous step, compared side by side.
The Golden Rule: Compare Like with Like
A comparison is only honest when both sides are measured the same way. Align the time windows (a partial month vs a full month is meaningless), the units, and the filters. Express the difference as both an absolute change and a percentage change — each tells a different part of the story.
In Python — period-over-period change
Python · pandas
import pandas as pd
# Monthly sales, then month-over-month and year-over-year change
monthly = (orders
.assign(month=orders["order_date"].dt.to_period("M"))
.groupby("month")["sales"].sum()
.rename("sales").to_frame())
monthly["mom_pct"] = monthly["sales"].pct_change() * 100 # vs last month
monthly["yoy_pct"] = monthly["sales"].pct_change(periods=12) * 100 # vs last year
# Category comparison as a pivot
by_cat = orders.pivot_table(index="category", columns="region",
values="sales", aggfunc="sum")
print(monthly.tail()); print(by_cat)
In SQL — LAG for period-over-period
SQL
WITH monthly AS (
SELECT DATE_TRUNC('month', order_date) AS month,
SUM(sales) AS sales
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
)
SELECT
month,
sales,
LAG(sales) OVER (ORDER BY month) AS prev_month,
ROUND(100.0 * (sales - LAG(sales) OVER (ORDER BY month))
/ NULLIF(LAG(sales) OVER (ORDER BY month), 0), 1) AS mom_pct
FROM monthly
ORDER BY month;
Tools Commonly Used
- Python: Pandas
pct_change,pivot_table,diff,merge. - SQL:
LAG/LEADwindow functions and self-joins. - Power BI / Tableau: time-intelligence DAX (
SAMEPERIODLASTYEAR), table calculations.
Best Practices
- Always state the baseline explicitly — "vs the same month last year", not just "+12%".
- Guard against divide-by-zero with
NULLIF(SQL) or by filtering zero baselines (pandas). - Report absolute and relative change together; a 200% jump on a tiny base can mislead.
- Normalize for different population sizes (per-customer, per-store) when comparing unequal groups.
Example: Comparing the Retail Regions
For the retail dataset, comparison might reveal:
- Year-over-year sales up 12% overall, but flat in one region that is dragging the average.
- One category growing while another quietly declines — invisible in the headline total.
- The "discount-driven" segment from the clustering step growing fastest in revenue but slowest in profit.