MLOps Infrastructure
When ML grows faster than infrastructure
- Python
- Kubernetes
- MLflow
- GitHub Actions
- Docker
- Terraform
CI/CD for ML: auto-deploy models with versioning, zero-downtime deployment, metrics via Prometheus + OTel
Designing ML systems that work on real traffic: search, generation, recommendations, pricing. Here's my path from business to ML development.
I finished engineering education in 2012. It didn't give great revelations, but taught the main thing: break complex things into parts and look for patterns.
At 20, I started working inside a large government mechanism and saw how decisions depend not on people, but on regulations, approvals, and piles of paperwork.
At the start, everything was done manually through chats: edits in 10 rounds, deadlines in Excel. No tracking, only "whoever remembers is right".
When the team grew from 5 to 30+ people, I realized - half the time went to putting out fires and forwarding tasks between people.
Had to redesign everything. Broke business processes into clear stages, implemented CRM and removed manual reports, replacing them with scripts. To test my hypotheses faster, I implemented APIs myself and wrote scripts for employees.
With the transition to a systems approach, profitability grew by 22%, and the number of overdue tasks in projects dropped to almost zero.
I realized: structure is power, it scales, people don't.
I started writing code more and more often. In IT there was what I lacked in marketing - transparent logic. There's input, there's code, there's result.
Realized that expanding the agency model wouldn't work - started building my own IT projects. Made parsers and bots in Python, built services to test how to connect data from SQL, logic, and automation into a working system.
My launches were engineering-startup experiments: how the system works, where it breaks, how to simplify and scale. Only one thing mattered to me: that everything technically works not once, but always.
Guides were already there, but scattered: Jupyter, Colab, TensorFlow articles. Tried to build infrastructure around it and realized that without systematic knowledge I couldn't move forward.
I was interested in the engineering side of ML and Data Science: how infrastructure works, how models go from training to production.
To understand this in practice, ran mini-services on Heroku, but saw the limit: wanted to understand how systems with high traffic hold up, where a failure costs money. Decided to go for Big Tech knowledge. At that moment there were no technical vacancies, but experience in marketing gave an opportunity to enter through Google, where I was responsible for marketing analytics and digital products.
Inside I went through ML programs - first theoretically, then on production tools. Hence the first experience with BigQuery, pipelines on Airflow, tested TFX. Understood how real systems work: deployment, logging, stability requirements. This gave a foundation for a systematic approach to ML infrastructure.
Since May 2022 I've been responsible for the ML chain in a B2B platform. Built modules for pricing, description generation, and demand forecasting on XGBoost and Scikit-learn. Designed an end-to-end pipeline with auto-retrain, fallback, and end-to-end monitoring, ensured SLA 99.9% at ~1 million predictions per day.
In 2023 joined a team of engineers for an AI platform for e-commerce. Worked on the architecture of multimodal search and recommendations: from growing embeddings to online ranking.
Under the hood, CLIP models and LLMs convert text and image queries into unified vectors, a fast FAISS index raises candidates, and on top a hybrid BM25 + neural network re-ranks them.
Implemented online fine-tuning on clicks: CTR grew by 14%, infrastructure costs fell by 30%.
Focus on platform architecture for ML products: auto-updating pipelines, observability, fault tolerance. I build so that an engineer doesn't patch bugs.
The final test for architecture is when it works, even if you're on vacation.
Where my experience was applied
Services
Solving engineering bottlenecks in ML production
When ML grows faster than infrastructure
CI/CD for ML: auto-deploy models with versioning, zero-downtime deployment, metrics via Prometheus + OTel
LLM inference without latency and budget drops
–42% cost/req through async RAG and fallback routing with cache, latency ~1.2s (Qdrant, FastAPI)
Behavioral personalization with real-time response
Real-time recsys: embedding + GBDT, caches (Kafka, Redis), feature pipeline on own store
LLM inference without budget drain on every request
up to –50% cost/req through scoring classifier (prompt length + tokens) and answer cache via semantic cache
Contact
Ready to discuss ML projects and implementations, I respond personally.
about me Senior/Staff ML Engineer. System design, ownership, high-traffic ML systems.
Fastest way
Write in Telegram