Telegram Antifraud Analytics for Media Plans

A production antifraud analytics system for Telegram media buying that cut inefficient spend, compressed batch review from 25 hours to 12 minutes, and kept explainability attached to every verdict.

Telegram Antifraud Analytics for Media Plans

One-liner: This Telegram antifraud system reduced inefficient media spend by 24%, compressed review of 100 channels from 25 hours to 12 minutes, and delivered a fraud-verification path with explicit explainability.

Executive summary

Telegram media buying is an ugly decision environment. Subscriber counts, views, reactions, and post timing can all be manipulated, and the more sophisticated channels do not look obviously fake. Media planners therefore end up paying for channels that look healthy on the surface but burn budget through inflated engagement and fabricated reach. At the same time, manual verification does not scale. Reviewing 100 channels could take about 25 analyst hours, which is too slow for real media planning cadence.

This project replaced that manual bottleneck with a hybrid fraud analytics system that scored channels in batch, generated clear buy / hold / avoid verdicts, and attached a transparent explanation to every decision. The system reduced inefficient spend by 24%, held Precision@Fraud at 0.90 with Recall@Fraud around 0.70, and brought the p95 turnaround for 100-channel review down to 12 minutes.

MetricBeforeAfterWhy it mattered
Inefficient spendbaseline-24%Media plans shifted away from manipulated channels
Review time for 100 channels25 hours12 min p95Channel audit became operationally usable during planning windows
Precision@Fraudn/a0.90High-risk flags became credible enough for business decisions
Recall@Fraudn/a0.70The system still captured enough fraud to matter commercially
False-positive rate22%9%Better thresholds reduced waste from over-blocking good inventory
Explainability coveragefragmented100%Every verdict carried a top-driver explanation

This case aligns most closely with Production ML Release Gates and the MLOps and Reliability topic page, because the hard part was not only fraud detection. It was building a production decision system that could explain itself, recalibrate safely, and remain economically useful.

Why the problem was worth solving

The business pain was straightforward and measurable.

  • Agencies were losing roughly 27% to 30% of media budget on low-quality or manipulated channels.
  • Review teams were too slow to keep up with the volume and velocity of media planning.
  • Existing market analytics tools exposed descriptive metrics but did not provide a calibrated fraud operating point tuned to the client’s cost of error.
  • There was no unified “channel quality” score that a planner could actually use in a buy decision.

The system therefore needed to do more than detect anomalies. It needed to produce a decision artifact that a media planner could trust under deadline pressure.

Operating targets

The project targets were deliberately set as a mix of model quality, operational speed, and product usability.

MetricTargetResult
Precision@Fraudat least 0.900.90
Recall@Fraudaround 0.700.70
MCCat least 0.550.58
False-positive ratebelow 12%9%
100-channel audit p95at most 15 min12 min
Data freshnessat most 24 h12 h or better
Feedback recalibration p95at most 5 minunder 5 min
Explainability coverage100%100%

These targets mattered because the system was not allowed to be clever but opaque. It had to be fast, precise enough, and interpretable.

How the system worked

Telegram antifraud pipeline diagram showing collection, feature engineering, scoring, verdict generation, and analyst feedback.

Production pipeline from public-signal collection through scoring, verdict generation, and analyst feedback.

The runtime followed six major stages.

1. Data collection

The collector pulled only public Telegram channel signals. It used rate-limit-aware queues, batching, and backpressure to survive the source constraints without breaking review SLAs.

2. Feature engineering

The system built channel-level features around engagement behavior, growth behavior, variance patterns, reaction patterns, late-view behavior, and additional anomaly flags. This is where the business advantage started. The pipeline did not treat “fraud” as one vague indicator. It decomposed channel quality into interpretable signals.

3. Rules and anomaly scoring

A hybrid scoring layer combined deterministic rules with anomaly-detection logic. This gave the team the control of an explainable system without giving up the ability to catch less obvious patterns.

4. Topic-aware thresholding

Channel behavior is not uniform across domains. Entertainment, news, and finance channels have different normal engagement ranges. The system therefore used a topic classifier plus size-aware baselines so that “suspicious” meant suspicious relative to the correct peer group.

5. Verdict generation

Channels were scored on a 0 to 100 FraudScore and mapped to buy / hold / avoid decisions. Every verdict included the strongest contributing signals rather than a mysterious single score.

6. Feedback loop

Analysts reviewed outcomes through a Telegram bot and could confirm or reject a verdict with one click. That feedback updated recalibration logic and fed model review without turning the system into an opaque self-learning black box.

FraudScore design

Telegram FraudScore diagram showing six interpretable submetrics flowing into the final operating decision.

FraudScore was built from six interpretable signals and then thresholded under a cost-sensitive policy.

The system used six interpretable submetrics:

  1. engagement-rate anomalies
  2. subscriber-growth anomalies
  3. coverage stability and coefficient-of-variation behavior
  4. reactions versus expected response
  5. late views after 24 hours
  6. residual anomalies such as zero-engagement or duplicate-content patterns

The final score was not just a sum of weird heuristics. It was a production operating point tuned under asymmetric error cost. In practice, the team optimized for expected loss rather than raw accuracy because false negatives and false positives do not cost the business the same amount.

That is the right production framing:

  • a false negative means buying fraudulent inventory
  • a false positive means discarding legitimate reach

The threshold was therefore calibrated under a cost-sensitive objective instead of a vanity metric.

Reporting and planner workflow

Telegram review workflow diagram showing batch review, planner export, analyst feedback, and recalibration.

Review workflow from batch output through planner export, analyst dispute handling, and safe recalibration.

The output was intentionally operational:

  • XLSX reports for media-planning workflows
  • JSON API for integration into internal systems
  • channel-level verdicts with top contributing reasons
  • estimated waste for budget planning

That is what turned the model into a workflow capability. Media teams did not need to inspect raw feature tables. They needed a review artifact they could use quickly and then contest or confirm through a feedback path.

Validation and evidence

The validation layer was strong enough to defend the system in front of skeptical operators.

  • more than 850 channels in the labeled dataset
  • dual-expert labeling with Cohen’s kappa of 0.78
  • group-aware splits to reduce leakage
  • bootstrap confidence intervals for precision, recall, and MCC
  • probability calibration with Brier and ECE monitoring
  • time-aware validation windows so the system did not overfit one historical snapshot

The holdout operating point delivered:

  • Precision@Fraud 0.90
  • Recall@Fraud 0.70
  • MCC 0.58
  • false-positive rate 9%

This is the kind of evidence that actually matters in production. It is not “the model did well on a benchmark.” It is “the decision policy landed where the business needed it to land.”

Topic-aware modeling

One of the better production decisions in the system was to avoid one global fraud baseline.

TopicTypical ER rangeOperational implication
Entertainment8% to 30%High interaction is not automatically suspicious
News3% to 12%Stable reach with lower reaction volume is normal
Finance4% to 18%Interaction patterns are lower and more concentrated

A lightweight TF-IDF plus logistic-regression topic classifier supported this layer. It reached macro-F1 of about 0.84 with latency under 50 ms, which was good enough for routing and adaptive thresholding without introducing a heavy inference dependency.

This is a strong example of production pragmatism. The topic classifier did not need to be state of the art. It needed to be cheap, stable, and good enough to stop the fraud system from overgeneralizing across very different channel categories.

Economics

The case became commercially compelling because the scoring system connected directly to budget waste.

For a typical RUB 9M monthly media plan:

ScenarioFraud shareBudget lossSavings vs. pre-system
Before deployment27%RUB 2.43Mbaseline
After deployment, conservative15%RUB 1.35MRUB 1.08M
After deployment, observed case11%RUB 990kRUB 1.44M

The unit economics were also clear:

  • about RUB 8 to 20 per channel at batch volumes above 5,000
  • about RUB 30 to 80 per channel below 1,000, where collection and caching overhead dominate more of the cost structure

That level of clarity matters when the buyer is not a research team. It is a media operation trying to decide whether the system pays for itself. Here it clearly did.

Impact after four weeks

MetricBeforeAfterDelta
Fraud share in media plans27%11%-59%
Analyst hours for 100 channels25 h0.2 h-99%
False-positive rate22%9%-59%
Time to report24 to 48 h0.08 h-99%
Disputed cases per week184-78%

The most telling part is not just speed. It is that disputed cases went down while the system got faster. That means the model was not simply pushing more automated noise downstream.

By the time the case was documented, the system had analyzed more than 10,000 channels and helped agencies avoid tens of millions of rubles in ineffective placement.

One-click feedback and safe self-improvement

The feedback loop deserves separate attention because it is where many fraud systems become unstable.

Analysts could upvote or downvote a verdict directly in the Telegram bot. That signal was written into the feedback layer and used for:

  • threshold recalibration by topic and size segment
  • weekly topic-classifier refits in shadow mode first
  • promotion only after quality checks
  • rollback if MCC or PR-AUC slipped below baseline

The system also included anti-poisoning protections:

  • weighting based on labeler trust history
  • quorum on contested changes
  • collusion-pattern detection
  • per-user rate limiting

This is important. “Self-learning” is usually where production language gets sloppy. In practice, safe improvement requires gating, rate limits, promotion rules, and rollback. Otherwise the feedback loop becomes a corruption path.

My role

I owned the ML and production decision layer for the system:

  • designed the six-part FraudScore and the thresholding policy
  • implemented topic-aware ranges that reduced false positives materially
  • optimized the batch path to hit the 100-channels-in-12-minutes objective
  • trained and deployed the topic classifier
  • built the recalibration and drift-monitoring loop
  • packaged the system into planner-ready reporting and ROI logic

This was not a notebook-only fraud model. It was a production analytics system designed to support money-moving decisions.

Technical annex

Six submetrics in more detail

The six interpretable submetrics were:

  • engagement-rate anomalies relative to topic and size
  • subscriber-growth spikes against expected baseline
  • suspiciously low or high coverage stability
  • reactions misaligned with expected view patterns
  • unusually high late-view share after 24 hours
  • residual anomalies such as duplicate content and zero-engagement behavior

The score was then calibrated into a probability and mapped into an operating decision. If explainability was incomplete, the system would not issue the strongest avoid verdict.

Validation policy

The validation path used holdout splits by channel identity, time-aware windows, and bootstrap confidence intervals. The business threshold was set by expected-cost minimization rather than a one-size-fits-all accuracy metric.

Infrastructure and API path

The system ran on Python 3.11 with FastAPI, PostgreSQL, Redis, Celery, and structured logging. Reports were generated as XLSX for media teams, while JSON responses supported system integration. Docker-based deployment kept the runtime simple enough for a small product team to operate.

Governance and retention

The system used only public data, stored decision logs for 90 days, and retained aggregate data longer for operational analysis. Explainability and audit trail were treated as product requirements, not internal debugging conveniences.

What this case proves

This case proves that antifraud analytics becomes strategically valuable only when it is attached to planner workflow, explainability, and cost-aware thresholds. A model that only says “this looks suspicious” is not enough. A production system needs to tell a team what to do, how confident to be, and how to learn safely from disagreement.

That is what this project delivered: not just fraud detection, but a decision system for media planning under uncertainty.

Bottom line

The platform reduced budget waste, accelerated review speed, and made fraud decisions explainable enough for real operations. It combined rules, anomaly logic, feedback, and planner workflow into one production path. For an adtech environment where bad inventory can quietly destroy ROI, that is the difference between analytics and real control.

FAQ

What did the system actually classify?

It scored Telegram channels for purchase risk by combining behavioral metrics, anomaly patterns, topic-aware baselines, and transparent verdict rules that media planners could review.

Why not use a single end-to-end ML classifier?

The client needed explainable production decisions and fast iteration on failure cases. A hybrid rules-plus-anomaly design gave better control over precision, feedback, and cost-sensitive thresholding.

How did the system keep improving after launch?

Analyst feedback from the Telegram bot fed recalibration and weekly model updates, while drift and quality metrics were monitored so threshold changes did not silently degrade performance.

Contact

If you have a production ML problem worth fixing, send the context and I'll reply directly.