← Back to Descriptive Analytics

Patterns & Insights

From "What Happened" to "What It Means"

Every step so far has organized and displayed the data. This step interprets it. Patterns are the recurring structures in the data — correlations, seasonality, anomalies, and relationships between variables. Insights are the so-what: the conclusions that change a decision. This is the payoff of descriptive analytics, and it is also the bridge to predictive work, because a pattern worth acting on is often a pattern worth modelling.

Detect the pattern, then state the insight Analyzed data metrics · trends · segments Patterns correlation · anomaly Insight the decision-changing so-what

A pattern is a structure in the data; an insight is the conclusion that changes a decision.

Kinds of Patterns to Look For

Correlation Is Not Causation

The most important discipline in this step: a relationship in the data does not prove one thing causes the other. Heavy discounting may correlate with high volume, but it may also correlate with low margin — and the cause could be a third factor entirely. Treat patterns as hypotheses to test, not conclusions to ship.

In Python — correlations and anomalies

Python · pandas

import pandas as pd

# 1. Correlation between the numeric drivers
corr = orders[["sales", "quantity", "discount", "profit"]].corr()
print(corr["profit"].sort_values())   # how each driver relates to profit

# 2. Pareto concentration: do 20% of customers drive ~80% of sales?
cust = orders.groupby("customer_id")["sales"].sum().sort_values(ascending=False)
share = cust.cumsum() / cust.sum()
top20 = share.iloc[: max(1, int(0.20 * len(cust))) - 1].iloc[-1]
print(f"Top 20% of customers = {top20:.0%} of sales")

# 3. Anomaly flag: months more than 2 std dev from the mean
monthly = orders.set_index("order_date").resample("ME")["sales"].sum()
z = (monthly - monthly.mean()) / monthly.std()
print(monthly[z.abs() > 2])

In SQL — flag the outlier periods

SQL

WITH monthly AS (
  SELECT DATE_TRUNC('month', order_date) AS month, SUM(sales) AS sales
  FROM orders GROUP BY DATE_TRUNC('month', order_date)
),
stats AS (
  SELECT AVG(sales) AS mu, STDDEV(sales) AS sd FROM monthly
)
SELECT m.month, m.sales,
       ROUND((m.sales - s.mu) / NULLIF(s.sd, 0), 2) AS z_score
FROM monthly m, stats s
WHERE ABS((m.sales - s.mu) / NULLIF(s.sd, 0)) > 2     -- anomalous months
ORDER BY m.month;

Tools Commonly Used

Best Practices

Example: Insights from the Retail Data

For the retail dataset, this step might surface:

← Back to Descriptive Analytics