All Projects

Customer LTV Estimation

First-Purchase LTV Predictor

PythonSQLMetabase APIShopify API

A system for estimating the lifetime value of a customer at the moment of their first purchase, designed to feed directly into ad platform bidding strategies and CRM segmentation.

The Problem

When a new customer places their first order, we know almost nothing about their long-term value. But the decisions we make in the first 48 hours, how much we bid to acquire similar customers, which email flow they enter, whether they get a follow-up offer, all depend on predicting what that customer will be worth over time.

Most LTV models require months of purchase history. By then, the window for acquisition optimization has closed. We needed a model that could estimate LTV from first-purchase signals: order value, product category, acquisition channel, market, device, and time to first purchase from first site visit.

The Approach

I’m building a prediction model that uses historical purchase data from Desertcart’s database to identify patterns in first-order characteristics that correlate with long-term value. The model segments customers into LTV tiers at the point of first purchase.

The output feeds two systems: ad platforms (so we can bid more aggressively for customer profiles that historically become high-value) and CRM (so first-time buyers enter email flows calibrated to their predicted value, not a generic welcome sequence).

The data pipeline pulls from the production database through the Metabase API and enriches it with Shopify order metadata. The model trains on cohorted historical data with proper holdout validation to avoid overfitting to seasonal patterns.

Key Decisions

  • First-purchase signals only, not behavioral enrichment. The model deliberately uses only data available at the moment of first purchase. Adding browse history or email engagement would improve accuracy but would also mean the prediction isn’t available when we need it most: immediately after the first order.

  • LTV tiers over point estimates. A precise dollar figure implies false confidence. Bucketing customers into tiers (high, medium, low predicted value) is more honest and more useful for the downstream systems that consume the output. Ad platforms and email flows work in segments, not in individual dollar amounts.

  • Designed for integration, not insight. This isn’t a dashboard or a report. The output is a segment assignment that gets written back into customer records and pushed to ad platforms. The value is in the automation, not in someone looking at a chart.

Status

Currently in development. Historical data analysis is complete. Model training and validation in progress.


Being built for Desertcart, a cross-border ecommerce marketplace. Architecture shown. Customer data anonymized.