Projects

Here are my production ML projects: from generative AI and recommendations to MLOps infrastructures and observability. In each case - problem, architecture, and real result.

Voice AI Operator for Call Center

On-prem voice AI operator handles 72% of calls without human in 0.96s with 58% cost reduction.

Client: NDA Domain: FinTech

Problem: 600 seats in contact center, 9 min wait, SLA penalties and new AI Act requirements, regulations outdated faster than operators can learn.

Solution: On-prem stack with streaming, model cascade, orchestration, and knowledge base. Safety rules and manual escalation.

ML Inference Latency and Cost Evaluation Platform

Internal tool for profiling latency, throughput, and $/req of models in production

Client: NDA Domain: E-commerce

Problem: No unified monitoring standard: teams deployed models randomly, GPUs idle, latency fluctuated, costs not tracked.

Solution: Built platform with Prometheus, Kubecost, and Torch/ONNX profiling - now visible latency, throughput, load, and $/req at model level.

RAG Assistant for Catalog

MVP chat search with deployment automation, experiments, and quality monitoring

Client: NDA Domain: E-commerce

Problem: 30% of queries return no results, p95 latency 1.5s, growing cloud LLM costs

Solution: Hybrid search (vector + BM25) with fine-tuned Mistral-7B, autoscaling in K8s, and cheap inference on 1 GPU

Telegram Antifraud Analytics for Media Plans

Fraud detection system reduces inefficient spending by 24% and automates verification of 100 channels in 12 minutes

Client: NDA Domain: AdTech

Problem: 30% of ad budget lost on channels with fraud, manual verification of 100 channels takes 25 hours

Solution: Hybrid detection system (rule-based + anomaly) with batch processing and adaptive thresholds by topics

Search and Recommendation System

Multimodal search and recommendation platform with full CI/CD pipeline, monitoring, and A/B experiments

Client: NDA Domain: E-commerce

Problem: High cost and p95 > 500 ms when searching 10M+ products in e-commerce

Solution: End-to-end architecture with CLIP encoder, HNSW (FAISS), TensorRT optimization, and canary A/B deployment