Profit Erosion from E-Commerce Returns
MDA Capstone Project — University of Niagara Falls Canada | DAMO-699-4 | Winter 2026
This capstone project quantifies the financial losses caused by product returns in e-commerce through
a multi-method analytics framework. The core concept: Profit Erosion = Margin Reversal + Processing Cost.
Using Google BigQuery's thelook_ecommerce dataset, the project identifies which product
categories, brands, and customer segments drive the most erosion — and builds predictive models to
flag high-risk customers before losses occur.
Research Questions & Methods
- RQ1 — Category & brand differences in erosion: Kruskal-Wallis test (p = 2.63 × 10⁻³³). Top erosion categories: Outerwear/Coats (~$2,000), Sweaters (~$1,600), Jeans (~$1,400).
- RQ2 — Customer behavioral segmentation: K-Means clustering · Gini coefficient: 0.409. Distinct segments identified by return behavior and erosion contribution.
- RQ3 — Predictive modeling for high-erosion customers: Random Forest classifier · AUC = 0.9798. Enables proactive intervention before losses accumulate.
- RQ4 — Behavioral drivers of erosion: Log-linear regression · R² = 0.7188. Return frequency is the strongest predictor (r = 0.614).
Tools & Techniques
- Languages & Libraries: Python 3.11+, Pandas, NumPy, scikit-learn, statsmodels
- Visualization: Matplotlib, Seaborn
- Dashboard: Streamlit (5 interactive pages, live deployment)
- Data source: Google BigQuery —
thelook_ecommercedataset - Statistical tests: Kruskal-Wallis, log-linear regression, K-Means clustering
- ML model: Random Forest classification with AUC evaluation
- Testing & CI/CD: 512+ pytest unit tests, GitHub Actions pipeline
- Validation: External dataset (School Specialty LLC) for cross-domain confirmation
Key Insights
- Top 20% of customers generate 47.6% of total profit erosion (Pareto principle confirmed).
- A tiered processing cost model ($12–$15.60 per return by category) reveals that operational costs amplify margin reversal significantly.
- Return frequency is the single strongest behavioral predictor of erosion (r = 0.614).
- The Random Forest model achieves AUC = 0.9798, enabling reliable early identification of high-erosion customers.
- Directional patterns validated on an external B2B dataset, confirming generalizability of the framework.
Business Impact
The framework provides e-commerce operators with a replicable methodology to measure, segment, and predict profit erosion from returns. Actionable outputs include targeted retention strategies for high-erosion customer segments, category-level pricing adjustments, and automated flagging of at-risk customers — directly supporting revenue protection decisions.