Search and Recommendation System
Multimodal search and recommendation platform with full CI/CD pipeline, monitoring, and A/B experiments
One-liner: I increased search conversion by 54% and reduced request cost by a third, implementing a multimodal search and recommendation platform for 10M+ SKU in 90 days.
What the system does in simple terms
Problem: searching 10M+ products in e-commerce is slow (620 ms), multimodal search and recommendations are expensive ($0.28 per QPS). Clients lost up to $0.7M monthly due to slow search, GPT-4 token bill reached $45k per month.
Solution: system understands text and images through CLIP encoder, finds similar products via FAISS HNSW in 175 ms and accurately ranks via cross-encoder. LLM cascade (Claude 3 → Mistral-7B) saves on expensive models, TensorRT INT8 speeds up 3.6x.
Savings: latency dropped from 620 ms to 175 ms (minus 72%), cost decreased by 33% (from $0.28 to $0.19 per QPS). CTR increased by 54%, empty results from 28% to 4%. GMV increased by $8.4M, support tickets dropped by 52%.
ML part: platform uses fine-tuned CLIP ViT-L/14 on 42M pairs for multimodal search, FAISS HNSW with category sharding, TensorRT optimization and cross-encoder for rerank. System automatically retrains on drift via Evidently, supports 300 QPS with 99.97% SLA.
TL;DR
| Before | After | What changed |
|---|---|---|
| p95 latency: 620 ms | 175 ms | −72% latency |
| Cost: $0.28/QPS | $0.19/QPS | −33% cost |
| CTR: baseline | +54% | +54% CTR |
| Zero results: 28% | 4% | −86% zero results |
| Manual deployment | CI/CD + canary | Zero-downtime deployment |
This is an English placeholder. Full translation coming soon.
FAQ
What changed most after deploying this search and recommendation stack?
Latency dropped substantially, recommendation relevance improved, and request economics became predictable through model and infrastructure optimization.
Why use a multimodal architecture here?
Combining text and image representations improved retrieval quality for diverse product queries where lexical matching alone was insufficient.
How was release risk controlled in production?
The team used CI/CD with canary and A/B rollout patterns, backed by monitoring and rollback criteria tied to both quality and performance signals.