Patterns & Insights

From "What Happened" to "What It Means"

Every step so far has organized and displayed the data. This step interprets it. Patterns are the recurring structures in the data — correlations, seasonality, anomalies, and relationships between variables. Insights are the so-what: the conclusions that change a decision. This is the payoff of descriptive analytics, and it is also the bridge to predictive work, because a pattern worth acting on is often a pattern worth modelling.

A pattern is a structure in the data; an insight is the conclusion that changes a decision.

Kinds of Patterns to Look For

Correlation: two variables that move together (discount and volume, or discount and margin).
Seasonality: repeating cycles by week, month, or quarter.
Anomalies: points that break the pattern — a sudden spike or a region falling off a cliff.
Concentration: the Pareto effect, where a small share of items drives most of the result.

Correlation Is Not Causation

The most important discipline in this step: a relationship in the data does not prove one thing causes the other. Heavy discounting may correlate with high volume, but it may also correlate with low margin — and the cause could be a third factor entirely. Treat patterns as hypotheses to test, not conclusions to ship.

In Python — correlations and anomalies

Python · pandas

import pandas as pd

# 1. Correlation between the numeric drivers
corr = orders[["sales", "quantity", "discount", "profit"]].corr()
print(corr["profit"].sort_values())   # how each driver relates to profit

# 2. Pareto concentration: do 20% of customers drive ~80% of sales?
cust = orders.groupby("customer_id")["sales"].sum().sort_values(ascending=False)
share = cust.cumsum() / cust.sum()
top20 = share.iloc[: max(1, int(0.20 * len(cust))) - 1].iloc[-1]
print(f"Top 20% of customers = {top20:.0%} of sales")

# 3. Anomaly flag: months more than 2 std dev from the mean
monthly = orders.set_index("order_date").resample("ME")["sales"].sum()
z = (monthly - monthly.mean()) / monthly.std()
print(monthly[z.abs() > 2])

In SQL — flag the outlier periods

SQL

WITH monthly AS (
  SELECT DATE_TRUNC('month', order_date) AS month, SUM(sales) AS sales
  FROM orders GROUP BY DATE_TRUNC('month', order_date)
),
stats AS (
  SELECT AVG(sales) AS mu, STDDEV(sales) AS sd FROM monthly
)
SELECT m.month, m.sales,
       ROUND((m.sales - s.mu) / NULLIF(s.sd, 0), 2) AS z_score
FROM monthly m, stats s
WHERE ABS((m.sales - s.mu) / NULLIF(s.sd, 0)) > 2     -- anomalous months
ORDER BY m.month;

Tools Commonly Used

Python: Pandas corr, statsmodels for seasonal decomposition, SciPy for tests.
SQL: STDDEV, window functions, and threshold filters for anomaly detection.
BI: Power BI Key Influencers, Tableau clustering and trend models.

Best Practices

Frame each pattern as a testable hypothesis before calling it an insight.
Quantify the insight — "the top 20% of customers drive 78% of sales" beats "a few customers matter".
Distinguish correlation from causation explicitly in the write-up.
Tie every insight back to a decision; an insight no one can act on is trivia.

Example: Insights from the Retail Data

For the retail dataset, this step might surface:

A negative correlation between discount depth and profit margin, strongest in one category.
A clear Q4 seasonal pattern that explains most of the year-over-year growth.
A Pareto concentration: a small set of customers driving the majority of revenue — a retention priority.

← Back to Descriptive Analytics