About Me

Designing ML systems that work on real traffic: search, generation, recommendations, pricing. Here's my path from business to ML development.

2010–2013

Thinking

I never had a dream to "become a programmer". I was interested in systems and why they work the way they do.

I finished engineering education in 2012. It didn't give great revelations, but taught the main thing: break complex things into parts and look for patterns.

At 20, I started working inside a large government mechanism and saw how decisions depend not on people, but on regulations, approvals, and piles of paperwork.

Efficiency comes from structure, not effort
2013–2016

Built a business

I launched an agency in digital marketing and quickly found myself at a point where growth started eating efficiency.

At the start, everything was done manually through chats: edits in 10 rounds, deadlines in Excel. No tracking, only "whoever remembers is right".

When the team grew from 5 to 30+ people, I realized - half the time went to putting out fires and forwarding tasks between people.

Had to redesign everything. Broke business processes into clear stages, implemented CRM and removed manual reports, replacing them with scripts. To test my hypotheses faster, I implemented APIs myself and wrote scripts for employees.

With the transition to a systems approach, profitability grew by 22%, and the number of overdue tasks in projects dropped to almost zero.

I realized: structure is power, it scales, people don't.

Managed complexity through structure, not people
Coded to remove dependency on hands
2016–2019

From business to code

The digital market was shifting toward recommendation systems, Big Data, and automation. I saw that this was the future and tried to restructure the agency for development: took startups, built product sites, entered outstaffing.

I started writing code more and more often. In IT there was what I lacked in marketing - transparent logic. There's input, there's code, there's result.

Realized that expanding the agency model wouldn't work - started building my own IT projects. Made parsers and bots in Python, built services to test how to connect data from SQL, logic, and automation into a working system.

My launches were engineering-startup experiments: how the system works, where it breaks, how to simplify and scale. Only one thing mattered to me: that everything technically works not once, but always.

Moved to systems where results depend on logic, not taste
Formed an approach: works means doesn't break, even without me
2019–2021

Jump into ML

In 2019 I built my first ML model from scratch, sticks, and forum datasets. It was a neural network on Keras generating music. Worked roughly, but the main thing - it generated on its own.

Guides were already there, but scattered: Jupyter, Colab, TensorFlow articles. Tried to build infrastructure around it and realized that without systematic knowledge I couldn't move forward.

I was interested in the engineering side of ML and Data Science: how infrastructure works, how models go from training to production.

To understand this in practice, ran mini-services on Heroku, but saw the limit: wanted to understand how systems with high traffic hold up, where a failure costs money. Decided to go for Big Tech knowledge. At that moment there were no technical vacancies, but experience in marketing gave an opportunity to enter through Google, where I was responsible for marketing analytics and digital products.

Inside I went through ML programs - first theoretically, then on production tools. Hence the first experience with BigQuery, pipelines on Airflow, tested TFX. Understood how real systems work: deployment, logging, stability requirements. This gave a foundation for a systematic approach to ML infrastructure.

ML without infrastructure is a toy
Focus: design solutions that work under load
2021–2024

ML in production

Google closed the office. Offered relocation, but again to marketing, refused: I want to not maintain ML, but be responsible for architecture and production.

Since May 2022 I've been responsible for the ML chain in a B2B platform. Built modules for pricing, description generation, and demand forecasting on XGBoost and Scikit-learn. Designed an end-to-end pipeline with auto-retrain, fallback, and end-to-end monitoring, ensured SLA 99.9% at ~1 million predictions per day.

In 2023 joined a team of engineers for an AI platform for e-commerce. Worked on the architecture of multimodal search and recommendations: from growing embeddings to online ranking.

Under the hood, CLIP models and LLMs convert text and image queries into unified vectors, a fast FAISS index raises candidates, and on top a hybrid BM25 + neural network re-ranks them.

Implemented online fine-tuning on clicks: CTR grew by 14%, infrastructure costs fell by 30%.

2024–present

Architecture that's trusted

Now I'm responsible for search and recommendations in e-commerce with traffic of 10 million requests per day. Inference on Triton, Faiss + CLIP, rerank BM25, recsys in ONNX. Any error is a minus in revenue.

Focus on platform architecture for ML products: auto-updating pipelines, observability, fault tolerance. I build so that an engineer doesn't patch bugs.

The final test for architecture is when it works, even if you're on vacation.

Where my experience was applied

Google ViSenze Ozon Huawei Media Instinct

Services

What I Work With

Solving engineering bottlenecks in ML production

MLOps Infrastructure

When ML grows faster than infrastructure

  • Python
  • Kubernetes
  • MLflow
  • GitHub Actions
  • Docker
  • Terraform

CI/CD for ML: auto-deploy models with versioning, zero-downtime deployment, metrics via Prometheus + OTel

GenAI / RAG Systems

LLM inference without latency and budget drops

  • LangChain
  • FastAPI
  • Qdrant
  • OpenAI
  • Mistral
  • Weaviate

–42% cost/req through async RAG and fallback routing with cache, latency ~1.2s (Qdrant, FastAPI)

Recommendation Systems

Behavioral personalization with real-time response

  • PyTorch
  • Faiss
  • Kafka
  • Redis
  • LightGBM
  • Feature Store

Real-time recsys: embedding + GBDT, caches (Kafka, Redis), feature pipeline on own store

Smart LLM Request Routing

LLM inference without budget drain on every request

  • OpenAI
  • Claude
  • Mistral
  • scikit-learn
  • FastAPI
  • Redis

up to –50% cost/req through scoring classifier (prompt length + tokens) and answer cache via semantic cache

Contact

Contact

Ready to discuss ML projects and implementations, I respond personally.