Patterns & Insights
From "What Happened" to "What It Means"
Every step so far has organized and displayed the data. This step interprets it. Patterns are the recurring structures in the data — correlations, seasonality, anomalies, and relationships between variables. Insights are the so-what: the conclusions that change a decision. This is the payoff of descriptive analytics, and it is also the bridge to predictive work, because a pattern worth acting on is often a pattern worth modelling.
A pattern is a structure in the data; an insight is the conclusion that changes a decision.
Kinds of Patterns to Look For
- Correlation: two variables that move together (discount and volume, or discount and margin).
- Seasonality: repeating cycles by week, month, or quarter.
- Anomalies: points that break the pattern — a sudden spike or a region falling off a cliff.
- Concentration: the Pareto effect, where a small share of items drives most of the result.
Correlation Is Not Causation
The most important discipline in this step: a relationship in the data does not prove one thing causes the other. Heavy discounting may correlate with high volume, but it may also correlate with low margin — and the cause could be a third factor entirely. Treat patterns as hypotheses to test, not conclusions to ship.
In Python — correlations and anomalies
Python · pandas
import pandas as pd
# 1. Correlation between the numeric drivers
corr = orders[["sales", "quantity", "discount", "profit"]].corr()
print(corr["profit"].sort_values()) # how each driver relates to profit
# 2. Pareto concentration: do 20% of customers drive ~80% of sales?
cust = orders.groupby("customer_id")["sales"].sum().sort_values(ascending=False)
share = cust.cumsum() / cust.sum()
top20 = share.iloc[: max(1, int(0.20 * len(cust))) - 1].iloc[-1]
print(f"Top 20% of customers = {top20:.0%} of sales")
# 3. Anomaly flag: months more than 2 std dev from the mean
monthly = orders.set_index("order_date").resample("ME")["sales"].sum()
z = (monthly - monthly.mean()) / monthly.std()
print(monthly[z.abs() > 2])
In SQL — flag the outlier periods
SQL
WITH monthly AS (
SELECT DATE_TRUNC('month', order_date) AS month, SUM(sales) AS sales
FROM orders GROUP BY DATE_TRUNC('month', order_date)
),
stats AS (
SELECT AVG(sales) AS mu, STDDEV(sales) AS sd FROM monthly
)
SELECT m.month, m.sales,
ROUND((m.sales - s.mu) / NULLIF(s.sd, 0), 2) AS z_score
FROM monthly m, stats s
WHERE ABS((m.sales - s.mu) / NULLIF(s.sd, 0)) > 2 -- anomalous months
ORDER BY m.month;
Tools Commonly Used
- Python: Pandas
corr, statsmodels for seasonal decomposition, SciPy for tests. - SQL:
STDDEV, window functions, and threshold filters for anomaly detection. - BI: Power BI Key Influencers, Tableau clustering and trend models.
Best Practices
- Frame each pattern as a testable hypothesis before calling it an insight.
- Quantify the insight — "the top 20% of customers drive 78% of sales" beats "a few customers matter".
- Distinguish correlation from causation explicitly in the write-up.
- Tie every insight back to a decision; an insight no one can act on is trivia.
Example: Insights from the Retail Data
For the retail dataset, this step might surface:
- A negative correlation between discount depth and profit margin, strongest in one category.
- A clear Q4 seasonal pattern that explains most of the year-over-year growth.
- A Pareto concentration: a small set of customers driving the majority of revenue — a retention priority.