A weekly AI merchandising system that scans 4,500+ products across 78 collections, generates evidence-backed proposals, and presents them for human approval — shifting the merchandiser from “what should I look at?” to “approve or reject.”

The Problem

Manual merchandising at 4,500+ products doesn’t happen slowly. It doesn’t happen at all. A merchandiser with full context of every collection, every product’s conversion rate, every search term returning zero results, every bundle opportunity, and every OOS slot on the homepage would need to check all of it every week to do the job correctly.

In practice this means important things go unnoticed:

Products with high conversion rates but low visibility never get promoted
OOS products sit on the homepage, wasting prime placement
Search terms with real demand return zero results because the right collection doesn’t exist
Co-purchased product pairs that would make natural bundles never get surfaced

The merchandiser’s time goes to the things they happen to notice, not the things that matter most.

The Approach

A four-stage pipeline: Analyst → Strategist → Human Review → Executor.

Analyst — 7 Scanners

Each scanner owns a specific area and writes its findings to Redis:

Collections — OOS rates, dead weight, overstuffed collections; ranked by urgency.
Product Pages — short descriptions, missing images, missing tags.
Homepage — OOS featured products, top converters not featured, poor converters currently featured, hero freshness.
Product Intelligence — identifies Hidden Gems: products with high conversion rate but low view count. First run found 182. These are products already converting well when people find them — the problem is visibility, not the product.
Search Intelligence — zero-result search terms, collection gaps, brand search opportunities. Cross-references against a DMCA-excluded brand list so suggestions don’t touch restricted brands.
Bundles — co-purchased product pairs with lift calculation, cross-category bundle opportunities.
Search (Synonyms) — typo synonyms validated against the storefront search API.

Strategist

After the scanners run, the Strategist reads all findings plus competitor prices from the Category Agent Fleet and GA4 performance data, then generates proposals using the Anthropic SDK (claude-sonnet-4, max_tokens=8192).

Each proposal includes an evidence pack explaining why the change is worth making. The Strategist self-evaluates each proposal on a 1–5 scale and drops anything scoring below 3 before it reaches the human review queue. It’s also nav-aware — it never suggests archiving collections that appear in top-level navigation, since those changes have site-wide consequences.

Human Review

The dashboard presents findings as FindingCard components: problem → evidence → proposed fix → Approve / Reject / Adapt.

“Adapt” works on any finding, even ones without a pre-generated proposal. The merchandiser can type a modification, and the AI synthesizes a revised proposal from the original finding plus the new input. Expandable cards show the affected products with direct links to their Shopify admin pages so approval doesn’t require context-switching.

Executor

On approval, the executor writes changes back to Shopify. Some paths are fully automated — homepage collection updates for quality-gated products go through without manual intervention. Others (creating new collections, reordering nav) are queued for the next partial-automation milestone.

Intelligence Layer

After each weekly run, the system writes a digest summarizing what was found, what was approved, and tracking resolution rates over time. This feeds back into future Strategist runs so it knows which types of proposals the merchandiser tends to accept.

Cross-System Data Flow

The merchandising fleet is intentionally downstream from the category fleet:

Category Agent Fleet
  fleet:price:{product_id}        → competitor pricing
  fleet:portfolio:report          → category balance data

Merchandising Fleet reads both, adds:
  merch:scan:collections
  merch:scan:homepage
  merch:scan:product_intelligence
  merch:scan:search_intelligence
  merch:proposals:{id}
  merch:findings:{id}

Proposals that involve pricing decisions incorporate the latest competitor prices from the category fleet. Portfolio balance findings from the category review inform which collection gaps are strategically important versus incidental.

Key Decisions

Strategist self-eval before human review. Every proposal that reaches the review queue has already been filtered by the AI. The merchandiser only sees proposals scoring ≥3/5. Low-quality suggestions never make it to the human step.
“Adapt” on every finding, not just proposals. The original design only allowed adapting findings that had a pre-generated proposal. This left a gap: scanners often surface important problems but the right solution requires human context. The adapt flow means any finding can be acted on — even ones where the Strategist didn’t generate a proposal.
Max tokens 8192 on the Strategist. The initial configuration at 4096 tokens caused truncated proposal outputs — the Strategist would start a proposal and cut off mid-reasoning. At 8192, full reasoning chains and evidence packs come through complete.
DMCA brand exclusion as a hardcoded guard. Some brands have active takedown notices or distribution restrictions. The search intelligence scanner uses a hardcoded exclusion set to ensure it never surfaces these as expansion opportunities. A management UI for the full list is on the roadmap.

Scale

7 scanners covering collections, pages, homepage, product intelligence, search, and bundles
4,500+ products scanned weekly
78 collections analyzed for health and composition
182 hidden-gem products identified (high conversion rate, low views)
Weekly cadence: runs Monday 06:00 GST, findings ready for morning review

Built for Hewyn (hewyn.com), a DTC supplement brand in the UAE. Architecture shown. Conversion data and revenue figures anonymized.

Merchandising Agent Fleet