AI Product Photography

A Python pipeline that generates editorial-quality product photography using the Gemini API, producing 4 distinct image types per product through an archetype-based prompting system.

The Problem

Product photography for an ecommerce brand is expensive and repetitive. Every new SKU needs the same set of shots: clean product-on-white for the PDP, an in-situ shot showing the product in context, and a lifestyle image for social and ads. Hiring a photographer for each product launch meant delays and costs that didn’t scale, especially when adding products regularly. The images needed weren’t creative or editorial in the traditional sense. They were structured, predictable, and followed a clear visual formula. That’s exactly the kind of work where AI generation makes sense.

The Approach

I built a Python pipeline that takes a product name, category, and a reference image, then generates 4 images per product through the Gemini API: 2 PDP-style product shots (clean background, different angles), 1 in-situ image (product in a realistic setting like a kitchen counter or bathroom shelf), and 1 lifestyle image (product integrated into a scene with people or activities).

The prompting system is built around archetypes. Instead of writing unique prompts for every product, I defined visual archetypes for product categories. A sleep supplement gets warm, dim lighting with bedroom contexts. An energy product gets bright, morning-light aesthetics. A gut health product gets clean, kitchen-adjacent settings. The archetype determines lighting, color palette, setting, and mood. The product-specific details (bottle shape, label colors, size) come from the reference image and product metadata.

Each generation request includes quality flags for higher resolution output. The pipeline handles retry logic, saves outputs with structured naming conventions, and logs which prompts produced which results so I can iterate on the archetypes over time.

Key Decisions

Archetype-based prompting over per-product prompts. Writing a unique prompt for every product doesn’t scale and produces inconsistent results. Archetypes give the brand a consistent visual language while still adapting to product specifics. Adding a new product means selecting an archetype, not writing a new prompt from scratch.
4 fixed image types over flexible generation. Constraining the output to exactly 4 types (2x PDP, 1x in-situ, 1x lifestyle) matches how ecommerce product pages are actually built. Every product needs the same slots filled. Open-ended generation sounds more powerful but produces images you can’t use without additional art direction.
Gemini API over Midjourney or DALL-E. Gemini’s image generation handled product photography prompts with better consistency for our use case, particularly for maintaining product shape accuracy from reference images. The API-first approach also meant the pipeline could run without manual interaction.

What I Learned

The quality gap between “looks good in a folder” and “works on a product page” is real. Generated images that looked impressive as standalone visuals fell apart when placed in a Shopify product grid next to real photography. Consistency matters more than any single image’s quality. The archetype system exists because of this lesson.

I also learned that AI product photography isn’t a replacement for a photographer. It’s a replacement for the “we need something up today and the shoot isn’t until next month” problem. The generated images serve as production-ready placeholders and social content, not as the final word on the brand’s visual identity.

Built for Little Luxuries, a personal ecommerce project. Pipeline architecture, prompting approach, and before/after examples shown freely.

The Problem

The Approach

Key Decisions

What I Learned

Other projects

AI Customer Support Assistant

Hewyn AI Agent

Data-Driven Marketing Strategy