A field guide for search teams adding tool calling as a bounded capability layer without losing relevance, latency discipline, safety, or rollback control.
Igor Yakushev
about me Senior ML Engineer | Search, Retrieval, RecSys
I build search, retrieval, ranking, and recommender systems. I write about architecture, evaluation, and production ML: articles and case studies on release safety, reliability, and cost.
Articles
Production notes on search, recommender systems, LLM agents, and ML reliability.
Official timeline, architecture mapping, breaking changes, and a practical runbook to migrate from Assistants API to Responses plus Conversations without service regressions.
A practical guide to offline-online regressions in RecSys: feedback loops, delayed labels, train/serve skew, OPE limits, 11 release gates, and an incident playbook.
Case Studies
Production ML case studies with architecture decisions, rollout logic, and measurable outcomes.
Voice AI Operator for Contact Center
An on-prem voice AI operator for a financial contact center that automated 72% of inbound calls, reduced cost per call by 58%, and stayed inside strict compliance boundaries.
Problem: A 600-seat financial contact center had nine-minute queue times, SLA penalties, and new AI governance requirements that made cloud-heavy automation hard to justify.
Solution: An on-prem voice AI stack with streaming ASR, model cascade, RAG-backed answer grounding, safety controls, and guaranteed human escalation.
ML Inference Latency and Cost Evaluation Platform
An internal ML platform for benchmarking latency, throughput, GPU utilization, and cost per request so teams could ship models with consistent release criteria.
RAG Assistant for Catalog
Production RAG catalog assistant with hybrid retrieval, reranking, and cost-aware serving that cut zero-result searches and improved CTR inside a one-GPU operating envelope.
About Me
I'm Igor Yakushev . I design ML solutions that handle traffic, save money, and don't break on Saturday night.
I started in business and marketing before moving into engineering. Today I own search and recommendation systems running at 10M+ requests per day.
My focus is systems that live under load, don't break, and don't require a hero.
"Make AI boring again."
ML Engineer · System design · Product approach
Contact
If you have a production ML problem worth fixing, send the context and I'll reply directly.
Igor Yakushev
Senior ML Engineer
about me Search, Retrieval, RecSys, and production ML for systems at 10M+ requests/day.
Fastest way
Write in Telegram