Projects

Production ML case studies across search, recommendations, GenAI/RAG, and ML platform work. Each one shows the operating problem, the architecture, and the measurable outcome.

Voice AI Operator for Contact Center

Voice AI Operator for Contact Center

An on-prem voice AI operator for a financial contact center that automated 72% of inbound calls, reduced cost per call by 58%, and stayed inside strict compliance boundaries.

Client: NDA Domain: FinTech

Problem: A 600-seat financial contact center had nine-minute queue times, SLA penalties, and new AI governance requirements that made cloud-heavy automation hard to justify.

Solution: An on-prem voice AI stack with streaming ASR, model cascade, RAG-backed answer grounding, safety controls, and guaranteed human escalation.

ML Inference Latency and Cost Evaluation Platform

ML Inference Latency and Cost Evaluation Platform

An internal ML platform for benchmarking latency, throughput, GPU utilization, and cost per request so teams could ship models with consistent release criteria.

Client: NDA Domain: E-commerce

Problem: Teams were deploying inference services without a shared operating standard, which meant unstable latency, poor GPU utilization, and no reliable view of cost per request.

Solution: A production benchmarking and observability platform built around profiling, Prometheus metrics, Kubecost attribution, and model-level release decision support.

RAG Assistant for Catalog

RAG Assistant for Catalog

Production RAG catalog assistant with hybrid retrieval, reranking, and cost-aware serving that cut zero-result searches and improved CTR inside a one-GPU operating envelope.

Client: NDA Domain: E-commerce

Problem: Thirty percent of catalog queries returned no results, p95 latency was above 1.5 seconds, and the cloud LLM bill was climbing toward an unsustainable operating model.

Solution: A hybrid retrieval stack with semantic search, BM25 fallback, reranking, and a fine-tuned Mistral-7B served through a cost-aware production path.

Telegram Antifraud Analytics for Media Plans

Telegram Antifraud Analytics for Media Plans

A production antifraud analytics system for Telegram media buying that cut inefficient spend, compressed batch review from 25 hours to 12 minutes, and kept explainability attached to every verdict.

Client: NDA Domain: AdTech

Problem: Media teams were losing roughly 30% of ad budget to manipulated Telegram channels, while manual review of 100 channels could take up to 25 analyst hours.

Solution: A hybrid fraud-detection stack combining rules, anomaly scoring, topic-aware thresholds, batch processing, and one-click feedback loops through a Telegram bot.

Search and Recommendation System

Search and Recommendation System

A multimodal search and recommendation platform for 10M+ products that improved CTR, cut latency, and lowered serving cost through disciplined production architecture.

Client: NDA Domain: E-commerce

Problem: Search across more than 10 million products was too slow, too expensive, and too brittle to support commercial discovery at scale.

Solution: A multimodal retrieval and recommendation platform built around CLIP embeddings, FAISS HNSW, TensorRT optimization, reranking, and controlled rollout.