A practical guide to offline-online regressions in RecSys: feedback loops, delayed labels, train/serve skew, OPE limits, 11 release gates, and an incident playbook.
About Me
Designing ML systems that work on real traffic: search, generation, recommendations, pricing. Here's my path from business to ML development.
Thinking
I finished engineering education in 2012. It didn't give great revelations, but taught the main thing: break complex things into parts and look for patterns.
At 20, I started working inside a large government mechanism and saw how decisions depend not on people, but on regulations, approvals, and piles of paperwork.
Built a business
At the start, everything was done manually through chats: edits in 10 rounds, deadlines in Excel. No tracking, only "whoever remembers is right".
When the team grew from 5 to 30+ people, I realized - half the time went to putting out fires and forwarding tasks between people.
Had to redesign everything. Broke business processes into clear stages, implemented CRM and removed manual reports, replacing them with scripts. To test my hypotheses faster, I implemented APIs myself and wrote scripts for employees.
With the transition to a systems approach, profitability grew by 22%, and the number of overdue tasks in projects dropped to almost zero.
I realized: structure is power, it scales, people don't.
From business to code
I started writing code more and more often. In IT there was what I lacked in marketing - transparent logic. There's input, there's code, there's result.
Realized that expanding the agency model wouldn't work - started building my own IT projects. Made parsers and bots in Python, built services to test how to connect data from SQL, logic, and automation into a working system.
My launches were engineering-startup experiments: how the system works, where it breaks, how to simplify and scale. Only one thing mattered to me: that everything technically works not once, but always.
Jump into ML
Guides were already there, but scattered: Jupyter, Colab, TensorFlow articles. Tried to build infrastructure around it and realized that without systematic knowledge I couldn't move forward.
I was interested in the engineering side of ML and Data Science: how infrastructure works, how models go from training to production.
To understand this in practice, ran mini-services on Heroku, but saw the limit: wanted to understand how systems with high traffic hold up, where a failure costs money. Decided to go for Big Tech knowledge. At that moment there were no technical vacancies, but experience in marketing gave an opportunity to enter through Google, where I was responsible for marketing analytics and digital products.
Inside I went through ML programs - first theoretically, then on production tools. Hence the first experience with BigQuery, pipelines on Airflow, tested TFX. Understood how real systems work: deployment, logging, stability requirements. This gave a foundation for a systematic approach to ML infrastructure.
ML in production
Since May 2022 I've been responsible for the ML chain in a B2B platform. Built modules for pricing, description generation, and demand forecasting on XGBoost and Scikit-learn. Designed an end-to-end pipeline with auto-retrain, fallback, and end-to-end monitoring, ensured SLA 99.9% at ~1 million predictions per day.
In 2023 joined a team of engineers for an AI platform for e-commerce. Worked on the architecture of multimodal search and recommendations: from growing embeddings to online ranking.
Under the hood, CLIP models and LLMs convert text and image queries into unified vectors, a fast FAISS index raises candidates, and on top a hybrid BM25 + neural network re-ranks them.
Implemented online fine-tuning on clicks: CTR grew by 14%, infrastructure costs fell by 30%.
Architecture that's trusted
Focus on platform architecture for ML products: auto-updating pipelines, observability, fault tolerance. I build so that an engineer doesn't patch bugs.
The final test for architecture is when it works, even if you're on vacation.
Articles
Recommended Articles
Architecture notes on ML/LLM systems: decisions, risks, and operations.
A practical engineering framework for choosing between workflow and agent: criteria, architecture patterns, evals, security, cost, and rollout plan.
A practical guide to shipping a support RAG agent with tool-calls: architecture contract, release gates, policy enforcement, observability, and FinOps.
Cases
Related Cases
Projects with clear problem framing, architecture, and measurable outcomes.
Voice AI Operator for Call Center
On-prem voice AI operator handles 72% of calls without human in 0.96s with 58% cost reduction.
Problem: 600 seats in contact center, 9 min wait, SLA penalties and new AI Act requirements, regulations outdated faster than operators can learn.
Solution: On-prem stack with streaming, model cascade, orchestration, and knowledge base. Safety rules and manual escalation.
ML Inference Latency and Cost Evaluation Platform
Internal tool for profiling latency, throughput, and $/req of models in production
RAG Assistant for Catalog
MVP chat search with deployment automation, experiments, and quality monitoring
Where my experience was applied
Contact
Contact
Ready to discuss ML projects and implementations, I respond personally.
Igor Yakushev,
ML Engineer
about me Senior/Staff ML Engineer. System design, ownership, high-traffic ML systems.
Fastest way
Write in Telegram