About Me

This page covers my current focus, recent projects, and the path that brought me into machine learning.

Search · Retrieval · RecSys · Multimodal Search · GenAI

Igor Yakushev

Senior ML Engineer | Search, Retrieval, RecSys

I build search, recommender, and NLP/GenAI systems on live traffic. I own architecture, experimentation, evals, and rollout across product, data, and infra.

01 10M+ queries per day
search and recommendations
02 End-to-end ownership
architecture, experiments, releases
03 NLP and evals
quality and regressions
  • Boundaries and contracts

    I start with system boundaries: data, serving, fallback paths, and failure points.

  • Evals before release

    Changes go through eval suites, regressions, online metrics, canaries, and rollback.

  • Decisions through constraints

    I set priorities through latency, cost, SLO, throughput, and product metrics.

View all
  • Ranking and recsys

    Ranking quality under latency, cost, and cold-start constraints.

  • Experiments

    A/B tests, online metrics, and decisions backed by numbers.

  • Reliability

    SLOs, monitoring, tracing, and incident response.

  • Latency and cost

    Latency budgets, throughput, and inference cost control.

  • NLP and GenAI

    Classification, generation, RAG, and agent flows with evals.

  • Product and economics

    Technical decisions tied to revenue, unit economics, and operating metrics.

From business to machine learning

The key transitions that shaped how I approach architecture, releases, and operating ML systems.

2010–2013

Thinking

I never had a dream of becoming a programmer. What drew me in was systems, and the reasons they worked the way they did.

I finished an engineering degree in 2012. It did not hand me some grand revelation, but it taught me the habit that still matters most: break complex things down and look for patterns.

At twenty, I found myself inside a large state bureaucracy and saw how much decisions depend on process, approvals, and paperwork. That is where my bias toward structure really took hold.

  • Efficiency comes from structure, not effort
2013–2016

Built a business

I started a digital marketing agency and fairly quickly hit the point where growth began to eat efficiency.

At first, everything ran through chats and manual coordination: endless revision loops, deadlines in Excel, and task ownership determined by whoever happened to remember.

When the team grew from 5 to 30+, I realized half our time was going into firefighting and forwarding work between people.

So I redesigned the operation. I broke the work into clear stages, introduced a CRM, killed manual reporting, and replaced it with scripts. To test ideas faster, I wired up APIs myself and built small internal tools for the team.

Once the system became explicit, profitability went up 22% and overdue work nearly disappeared. That was the moment I learned a lesson that never left me: structure scales better than heroics.

  • Managed complexity through structure, not people
  • Coded to remove dependency on hands
2016–2019

From business to code

The market was moving toward recommender systems, Big Data, and automation. I could see where it was going, and I tried to reposition the agency toward software work: startup projects, product sites, and outstaff development.

I was writing more and more code myself. What drew me to software was its clarity. There is an input, there is code, and there is an outcome.

Eventually it became obvious that the agency model was the wrong vehicle for that transition. I started building small IT products of my own: parsers, Python bots, and services that tied together SQL, business logic, and automation.

Those were not polished launches. They were engineering experiments. I wanted to understand where systems break, how to simplify them, and how to make them work reliably rather than just once.

  • Moved to systems where results depend on logic, not taste
  • Formed an approach: works means doesn't break, even without me
2019–2021

Entering ML

In 2019, I built my first ML model out of forum datasets and whatever tools I could get my hands on. It was a Keras network that generated music. It was rough, but that was not the point. The point was that the model did something on its own.

The learning materials were there, but fragmented: Jupyter notebooks, Colab demos, TensorFlow blog posts. I started trying to build infrastructure around the model and quickly realized that the real problem was not the model alone. It was the system around it.

That is what pulled me toward the engineering side of ML: data pipelines, training flows, deployment, observability, and the path from experiment to production.

I ran small services on Heroku to learn by doing. It helped, but only up to a point. I wanted to understand how high-traffic systems hold up when failure has an actual cost. That is what pushed me toward Big Tech. There were no technical openings at the time, but my marketing background got me into Google, where I worked on analytics and digital products.

Inside Google, I went through ML programs that moved from theory to production tooling. That is where I first worked with BigQuery, Airflow, and TFX and started to understand what real ML systems require: stability, logging, release discipline, and clear operating boundaries.

  • ML without infrastructure is a toy
  • Focus: design solutions that work under load
2021–2024

ML systems engineering

When Google shut the office down, I was offered relocation back into a marketing role. By then I knew I wanted something else: not campaign ownership, but architecture, reliability, and the day-to-day behavior of ML systems in production.

From May 2022, I owned the ML pipeline for a B2B platform. I built pricing, description generation, and demand forecasting modules on XGBoost and Scikit-learn. Around them, I designed a full pipeline with automatic retraining, fallback paths, and end-to-end monitoring. It held a 99.9% SLA at roughly one million predictions per day.

In 2023, I moved to an AI platform team in e-commerce. The scale changed completely: multimodal search and recommendations, embeddings, candidate retrieval, online ranking, and hard latency and cost constraints.

Under the hood, CLIP and LLMs mapped text and image queries into a shared vector space, FAISS handled fast retrieval, and a hybrid of BM25 and neural reranking produced the final ordering. Later I launched click-driven online learning: CTR went up 14%, while infrastructure costs dropped 30%.

  • Production ML starts where the model becomes part of a system
  • Search and ranking force you to hold quality, cost, and reliability at the same time
2024–present

Trusted architecture

Today I own search and recommendation architecture in e-commerce at roughly 10 million requests a day. The stack includes Triton for inference, CLIP and FAISS for multimodal retrieval, BM25 and reranking models, and ONNX-backed recommendation flows. At that scale, a mistake does not stay theoretical for long. It shows up in revenue, latency, and result quality.

By 2025, the problem itself had changed. After the rise of generative models and agent-like interfaces, search stopped being just a ranked list of documents, and recommendations stopped being just nearest-neighbor matching. More and more often, the system is expected to understand intent, hold context, and move the user toward the next useful action.

That raised the bar for architecture. Fast retrieval and good reranking are no longer enough on their own. The system has to absorb noisy queries, external model degradation, traffic spikes, long feature chains, and constant experimentation without turning into a brittle mess.

My focus shifted from individual models to the platform layer around them: self-updating pipelines, observability, quality controls, fallback paths, reproducible releases, and clear boundaries between components. The noisier the environment gets, the more the system needs to stay legible on the inside.

My definition of trusted architecture is simple. It scales, absorbs new scenarios, and does not depend on heroics. Agent-style workflows make that even more obvious: they do not remove the need for engineering. They expose weak architecture faster.

  • Agent-style workflows do not replace architecture. They stress it harder and reveal the cracks faster.
  • A strong system runs on design, not heroics.

Where my experience was applied

ViSenze – AI platform for visual e-commerce searchHuawei – telecom and infrastructureOzon – marketplace and e-commerceGoogle – global leader in MLMedia Instinct – marketing agency

Recommended Articles

Architecture notes on ML/LLM systems: decisions, risks, and operations.

Contact

If you have a production ML problem worth fixing, send the context and I'll reply directly.