← Back to Descriptive Analytics

Transforming & Aggregation

Making Data Ready for Analysis

After collecting and cleaning data, the next step is Transforming and Aggregating. At this stage, raw but clean data is reshaped into structures that are easier to analyze and interpret. Transformation focuses on adjusting the format, scale, and relationships within the dataset — for example, converting categorical text into numerical codes, normalizing values to ensure comparability, or merging different sources into a unified dataset. This process often involves feature engineering, where new variables are created to capture additional insights that may not be directly available from the raw data.

Aggregation, on the other hand, condenses detailed records into higher-level summaries that highlight key trends and reduce complexity. This could mean grouping transactions by month, calculating averages or totals per category, or computing rolling metrics like moving averages to smooth fluctuations. The combination of transformation and aggregation not only makes large datasets more manageable but also ensures that the analysis focuses on metrics that are consistent, relevant, and aligned with business objectives. In essence, this step bridges the gap between raw data and actionable insights, preparing the ground for advanced analytics, visualization, and reporting.

Reshape the rows, then summarize them Clean rows transaction level Transform derive · encode join · scale Aggregated metrics by month · region

Transformation engineers new fields and reshapes the data; aggregation condenses it into the metrics that matter.

Data Transformation

Transformation involves converting data into a format suitable for analysis. Common techniques include:

Data Aggregation

Aggregation helps summarize data into higher-level views:

In Python — feature engineering and aggregation

Python · pandas

import pandas as pd

# Feature engineering: derive unit price, revenue and profit per order
price = {"Furniture": 180, "Office": 25, "Technology": 320}
orders["unit_price"] = orders["category"].map(price)
orders["sales"]  = orders["quantity"] * orders["unit_price"] * (1 - orders["discount"])
orders["profit"] = orders["sales"] * (0.30 - orders["discount"])   # margin shrinks with discount
orders["month"]  = orders["order_date"].dt.to_period("M").dt.to_timestamp()

# Aggregation: monthly sales and profit by region
monthly = (orders.groupby(["month", "region"])
                 .agg(sales=("sales", "sum"),
                      profit=("profit", "sum"),
                      orders=("order_id", "nunique"))
                 .reset_index())
print(monthly.head())

In SQL — derive and aggregate

SQL

SELECT
  DATE_TRUNC('month', order_date)            AS month,
  region,
  SUM(quantity * unit_price * (1 - discount)) AS sales,
  SUM(quantity * unit_price * (1 - discount) * (0.30 - discount)) AS profit,
  COUNT(DISTINCT order_id)                    AS orders
FROM staging.orders_clean
GROUP BY DATE_TRUNC('month', order_date), region
ORDER BY month, region;

Tools for Transformation & Aggregation

Best Practices

Example: Retail Sales Transformation

For the retail dataset, transformations may include deriving sales and profit from quantity, price, and discount; mapping each category to a unit price; and extracting the order month. Aggregations could then include:

← Back to Descriptive Analytics