What changed most after deploying this search and recommendation stack?

Latency dropped substantially, recommendation relevance improved, and request economics became predictable through model and infrastructure optimization.

Why use a multimodal architecture here?

Combining text and image representations improved retrieval quality for diverse product queries where lexical matching alone was insufficient.

How was release risk controlled in production?

The team used CI/CD with canary and A/B rollout patterns, backed by monitoring and rollback criteria tied to both quality and performance signals.

Search and Recommendation System

One-liner: I increased search conversion by 54% and reduced request cost by a third, implementing a multimodal search and recommendation platform for 10M+ SKU in 90 days.

What the system does in simple terms

Problem: searching 10M+ products in e-commerce is slow (620 ms), multimodal search and recommendations are expensive ($0.28 per QPS). Clients lost up to $0.7M monthly due to slow search, GPT-4 token bill reached $45k per month.

Solution: system understands text and images through CLIP encoder, finds similar products via FAISS HNSW in 175 ms and accurately ranks via cross-encoder. LLM cascade (Claude 3 → Mistral-7B) saves on expensive models, TensorRT INT8 speeds up 3.6x.

Savings: latency dropped from 620 ms to 175 ms (minus 72%), cost decreased by 33% (from $0.28 to $0.19 per QPS). CTR increased by 54%, empty results from 28% to 4%. GMV increased by $8.4M, support tickets dropped by 52%.

ML part: platform uses fine-tuned CLIP ViT-L/14 on 42M pairs for multimodal search, FAISS HNSW with category sharding, TensorRT optimization and cross-encoder for rerank. System automatically retrains on drift via Evidently, supports 300 QPS with 99.97% SLA.

TL;DR

Before	After	What changed
p95 latency: 620 ms	175 ms	−72% latency
Cost: $0.28/QPS	$0.19/QPS	−33% cost
CTR: baseline	+54%	+54% CTR
Zero results: 28%	4%	−86% zero results
Manual deployment	CI/CD + canary	Zero-downtime deployment

This is an English placeholder. Full translation coming soon.

Search and Recommendation System

What the system does in simple terms

TL;DR

FAQ

What changed most after deploying this search and recommendation stack?

Why use a multimodal architecture here?

How was release risk controlled in production?

Contact

Igor Yakushev,
ML Engineer

What the system does in simple terms

TL;DR

FAQ

What changed most after deploying this search and recommendation stack?

Why use a multimodal architecture here?

How was release risk controlled in production?

Contact

Igor Yakushev,ML Engineer

Igor Yakushev,
ML Engineer