Returns Detector

A refund analytics dashboard that surfaces problematic products and serial returners across three sales channels and seven marketplaces, so the operations team can act before losses accumulate.

The Problem

Desertcart operates across multiple sales channels: its own direct platform, seven marketplace sub-channels (Amazon AE, SA, IN, ZA, Noon AE, OnBuy, Walmart CA), and a Shopify-based wellness brand (Hewyn). With thousands of orders flowing through these channels, identifying which products have abnormally high return rates and which customers are serial returners was entirely manual and reactive. Someone would notice a product seemed to come back a lot, or a finance report would flag an unusual refund spike, and then the investigation would start from scratch.

By the time anyone noticed a pattern, the losses had already accumulated. There was no systematic way to answer basic questions: “Which products have the highest refund rate this quarter?” “Are there customers who return more than they keep?” “Is our refund rate trending up or down by category?”

The Approach

I built a single-purpose analytics dashboard with five views that give the operations team everything they need to act on refund patterns:

Dashboard KPIs show the headline numbers: total refunds, refund rate by channel, trend direction, and channel-level breakdowns. This is the “is something wrong?” view.

Products view ranks products by refund rate with severity flags. Minimum thresholds filter out statistical noise (a product needs at least 2 total items sold to appear), so the team isn’t chasing one-off returns. Category breakdowns show whether a problem is isolated to a specific product or systemic across a category.

Users view identifies serial returners. Again, minimum thresholds (at least 2 orders) prevent false positives. The team can see a customer’s full order and refund history to decide whether to flag the account.

Reasons view categorizes every refund into four standard buckets: OUR_FAULT, PRODUCT_ISSUES, CUSTOMER, and UNKNOWN. Raw refund reasons from each channel get normalized through a categorization pipeline so cross-channel comparison is meaningful. “Item arrived damaged” from Shopify and “defective product” from Amazon both map to PRODUCT_ISSUES.

Decisions view ties into a separate alerts dashboard where Account Managers can resolve problematic orders inline, with 17 categorized resolution reasons and persistent overrides that exclude resolved orders from future scans.

Key Decisions

Multi-channel normalization into a common structure. Each channel has its own data source and format: Metabase SQL queries for desertcart, Shopify REST API for Hewyn, TSV files for marketplace data. The app normalizes everything into a common analytics structure with USD currency conversion (AED to USD for Hewyn, subunit-aware conversion for desertcart). Without this normalization layer, cross-channel comparison would be meaningless.
Minimum thresholds for statistical validity. Products need at least 2 total items and users need at least 2 orders to appear in “top offenders” lists. This eliminates the noise from one-off returns that would otherwise dominate the rankings and waste the team’s time investigating non-patterns.
Reason categorization pipeline over raw strings. Raw refund reasons are free text from different systems with different conventions. The normalization functions map these into four standard categories, making it possible to ask “what percentage of our returns are our fault?” across all channels in a single number.
Scale-to-zero deployment on Fly.io. This is an internal tool with intermittent usage. It might get heavy use during a monthly review and then sit idle for days. Fly.io with min_machines_running=0 and 512MB RAM means it sleeps when unused and wakes on request. The idle cost is zero.
Single-file Flask with vanilla JavaScript. No React, no build step, no bundler. A single Python file (roughly 3,000 lines) serves both the API and the frontend. For an internal tool with a small user base, this is the right level of complexity. It deploys instantly, is trivial to debug, and anyone on the team can read it.

Under the Hood

The data pipeline normalizes three different source formats into a common analytics structure:

┌─────────────────────┐  ┌─────────────────────┐  ┌─────────────────────┐
│   Desertcart        │  │   Hewyn (Shopify)   │  │   Marketplace       │
│   Metabase SQL      │  │   REST API          │  │   TSV Files         │
│   15+ currencies    │  │   AED only          │  │   7 sub-channels    │
│   subunit-aware     │  │   AED→USD convert   │  │   Amazon/Noon/etc   │
└────────┬────────────┘  └────────┬────────────┘  └────────┬────────────┘
         │                        │                        │
         └────────────┬───────────┴────────────────────────┘
                      ▼
         ┌────────────────────────────────┐
         │  Normalized Analytics Layer    │
         │  Common structure, USD base    │
         │  Standardized reason codes     │
         │  (OUR_FAULT / PRODUCT_ISSUES   │
         │   / CUSTOMER / UNKNOWN)        │
         └────────────┬───────────────────┘
                      ▼
         ┌────────────────────────────────┐
         │  5 Analytics Views             │
         │  Dashboard · Products · Users  │
         │  Reasons · Decisions           │
         └────────────────────────────────┘

The reason categorization pipeline normalizes free-text refund reasons:

def categorize_reason(raw_reason: str) -> str:
    """
    Maps raw refund reasons from any channel
    into four standard categories.

    "Item arrived damaged"     → OUR_FAULT
    "Wrong item shipped"       → OUR_FAULT
    "Product not as described" → PRODUCT_ISSUES
    "Changed my mind"          → CUSTOMER
    "No reason given"          → UNKNOWN
    """

The alerts resolution workflow lets Account Managers act directly:

resolution_reasons = [
    "Cancel Order",
    "Needs Reconciliation",
    # ... 17 categorized reasons total
]

# Resolved orders get persistent overrides
# excluded from future analytics scans

Scale

3 sales channels analyzed (desertcart direct, marketplace, Hewyn/Shopify)
7 marketplace sub-channels (Amazon AE/SA/IN/ZA, Noon AE, OnBuy, Walmart CA)
5 analytical views (Dashboard, Products, Users, Reasons, Decisions)
4 standardized refund reason categories
17 resolution reason types in the alerts workflow
Scale-to-zero deployment on Fly.io (zero cost when idle)

What I Learned

Real company data is messy in ways that are hard to appreciate until you’re in it. A refund isn’t just a refund. It might be a sizing issue, a wrong item shipped, a product damaged in transit to a country with extreme heat, a customer who ordered the wrong variant, a marketplace policy abuse, or a legitimate quality defect. Each of those requires a different operational response. Delisting a product because of high returns makes sense if the product is defective. It’s the wrong call if the returns are driven by incorrect shipment or misleading product information that you can fix.

The counting bugs made this concrete. Total items counted each line item as 1 while refunds correctly used quantities. A customer orders 3 of the same product, returns all 3. The system saw 1 item sold, 3 refunded: a 300% refund rate. Every metric downstream was inflated. The fix was trivial, but finding it required tracing the entire pipeline end to end. And that was just one example. Currency subunits, marketplace-specific refund policies, shipping delays that look like returns, country-specific customs rejections that get logged as customer refunds. Each layer adds complexity that a clean spreadsheet hides.

This is what makes operational analytics fundamentally different from product analytics. Product analytics deals with clean events: a user clicked, viewed, converted. Operational analytics deals with the physical world where a package sits in Dubai customs for 3 weeks, arrives damaged, and the refund gets logged against the wrong reason code because the support agent picked the first option in the dropdown. Making good decisions from this data requires knowing the domain well enough to understand what the numbers actually mean, not just what they say.

Without that domain knowledge, the natural instinct is to oversimplify. High refund rate? Delist the product. Serial returner? Flag the account. But the right answer depends on context that only exists in the operational reality of running cross-border commerce across seven marketplaces in four countries. The dashboard surfaces the patterns. Domain expertise decides what to do about them.

Built for desertcart operations team. Architecture and analytical views shown. All order volumes and revenue figures anonymized.

The Problem

The Approach

Key Decisions

Under the Hood

Scale

What I Learned

Other projects

AI Customer Support Assistant

Hewyn AI Agent

Data-Driven Marketing Strategy