Detection signals: how automated poker play gets flagged
Summary: Because cryptography can't identify a bot, detection is a behavioral problem. The strongest signals are statistical, not visual: the shape of a player's action-timing distribution, the entropy of their bet sizing, and multi-tabling patterns that exceed human capacity. No single signal is proof; detection systems combine many weak signals into a confidence score, then route the high-confidence cases to human review.
Detection rarely catches a bot in a single hand. It catches it the way you'd catch a loaded die — not by looking at one roll, but by watching the distribution drift away from what a human produces. This note walks through the three signal families that carry the most weight, framed for anyone building or studying these systems.
Signal 1 — Action-timing distributions
The single most reliable behavioral signal is time-to-act: how long a player takes to make each decision. Humans produce a broad, right-skewed distribution — fast on trivial spots, slow on hard ones, with natural noise from attention, fatigue, and tabbing between tables. A naive bot, by contrast, acts on a tight internal clock and produces a sharp spike.
The giveaway is not the average — a bot can be tuned to a human-looking mean. It's the shape. A human histogram is wide and lumpy; a scripted one is narrow, or shows artificial regularity like quantization to fixed intervals. Plot enough decisions and the two separate cleanly:
Sophisticated bots fight back by adding randomized delays drawn from a human-like distribution. That raises the bar but rarely closes the gap, because human timing is context-dependent — correlated with hand difficulty, stack depth, and prior action. Reproducing the marginal distribution is easy; reproducing the conditional structure (slow exactly when the spot is hard) is much harder, and mismatches there are themselves a signal.
A few features that fall out of timing analysis specifically:
- Variance vs. difficulty correlation. Humans slow down on close decisions and speed up on obvious ones. A flat response time across spots of wildly different difficulty is unnatural.
- Quantization. Delays that snap to round intervals (every action at a multiple of 200ms) betray a discrete internal scheduler.
- Pre-action leakage. Acting before it is plausible that a human could have read the board — sub-perceptual reaction times on complex spots.
- Absence of drift. Human timing wanders over a session as attention fades; a steady distribution across six hours is a flag.
Signal 2 — Bet-sizing entropy
Solvers output precise, repeatable sizings — 33% pot, 67% pot, 125% pot — and they hit them to the chip across thousands of hands. Humans are messier: they round to convenient numbers, drift with mood and stack size, and rarely reproduce the same exact sizing in the same spot every time.
Measuring the entropy of a player's sizing choices captures this. Two failure modes both look suspicious:
- Entropy too low. The player snaps to a tiny set of exact fractions with machine precision — a solver fingerprint.
- Strategy too clean. Sizings track GTO frequencies far more tightly than any human studies into reflex, especially across many simultaneous tables.
On a provably-fair site the recorded hand histories are tamper-evident, which makes this kind of long-run statistical aggregation more trustworthy — the data you're measuring is verifiably real, as covered in the note on provable fairness.
Signal 3 — Multi-tabling and session patterns
Automation's economic point is scale, and scale leaves prints:
| Pattern | Why it flags | Strength |
|---|---|---|
| Table count beyond human capacity | Consistent A-game across 16+ tables exceeds attainable human attention | High |
| Marathon sessions, no fatigue drift | Decision quality and timing don't degrade over many hours | Medium |
| Hardware / client fingerprint reuse | Many “players” share one machine signature | High |
| Headless or instrumented client artifacts | Automation hooks visible at the client layer | High |
| Synchronized action across seats | Several accounts act on a shared clock | Medium |
None of these is conclusive alone — a strong human grinder can multi-table heavily, and shared hardware can mean a household. That is exactly why they feed a combined score rather than triggering action directly.
Combining weak signals
The architecture detection teams converge on looks like this:
per-player features
timing-shape divergence
sizing entropy
multi-table / fatigue metrics
client + network fingerprints
→ weighted anomaly score
→ threshold → queue for human review
→ manual confirmation → action
The design principle is that any single feature is noisy and beatable, but they are hard to beat simultaneously. A bot tuned to pass the timing test may give itself away on sizing entropy; one that randomizes both may still trip the multi-tabling or fingerprint checks. Crucially, the final step is human — automated scoring decides where to look, not who gets penalized, which keeps false positives from hitting legitimate grinders.
The false-positive problem
The hard constraint on any detection system is not catching bots — it is not punishing humans. A skilled, study-heavy regular who multi-tables and plays close to GTO will, by construction, look somewhat bot-like: tight sizing, fast accurate decisions, long sessions. Treating “plays well” as “plays automated” would gut the player base of exactly the customers a room wants. This is why mature systems weight mechanical impossibility signals — sub-human reaction times, shared client fingerprints, perfect cross-table synchronization — far above skill signals like good sizing. You can be a brilliant human; you cannot be in two places acting on the same millisecond.
The arms race
Detection and evasion co-evolve. Each defensive signal invites a countermeasure, which invites a meta-signal:
| Evasion | Counter-signal |
|---|---|
| Randomized human-like delays | Check conditional timing — slow on hard spots, not just on average |
| Deliberate sizing “mistakes” | Errors that never cost EV are themselves a tell |
| Rotating accounts and IPs | Hardware fingerprint and play-style clustering across accounts |
| Capping table count to look human | Lower volume per account, but identical style across the cluster |
The asymmetry favors the defender in the long run: the bot must look human on every axis at once and over thousands of hands, while the detector only needs one durable inconsistency. The more an operator tunes a bot to defeat one signal, the more it tends to distort another. Perfect mimicry across the full feature set, sustained over a large sample, is the actual bar — and it is a high one.
Feature engineering, in practice
For anyone building one of these systems, the work is less about exotic models and more about engineering features that are robust to mimicry. A few principles that tend to hold:
- Prefer conditional features over marginal ones. “Average decision time” is trivially spoofable. “Decision time as a function of spot difficulty” is much harder to fake because it requires the bot to model difficulty the way a human experiences it.
- Aggregate over long windows. Single hands are noise. The signal lives in distributions over hundreds or thousands of decisions, where a human's natural variance and a script's artificial regularity diverge.
- Cross-reference accounts. Many evasion strategies survive at the single-account level but collapse when you cluster by play style, timing fingerprint, and hardware across the whole population.
- Keep the human in the loop. The model's job is triage — surface the most anomalous cases for review. Treating a score as an automatic verdict is how you ban your best customers.
The output is rarely a clean “bot / not bot” label. It is a ranked queue of accounts whose behavior is statistically improbable for a human, ordered by confidence, with the impossible-for-a-human signals (sub-perceptual reaction times, shared fingerprints, synchronized action) weighted heaviest because they are the hardest to explain away innocently.
What detection cannot do
It is honest to name the limits. Behavioral detection is statistical, so it is probabilistic — it produces confidence, not certainty, and a sufficiently restrained bot (low volume, heavily randomized, single account, clean hardware) can stay under the threshold for a long time. The trade-off the bot operator faces is that every step taken to evade detection — fewer tables, more randomness, deliberate suboptimality — directly erodes the economic edge that motivated automation in the first place. Detection doesn't need to be perfect; it needs to make undetectable botting unprofitable. That is a softer but more achievable goal, and it is the one well-run systems actually optimize for.
What this means for crypto poker specifically
The detection problem on a provably-fair site is, mechanically, the same behavioral problem as on any poker room — the on-chain layer changes the trust in the data, not the method of the verdict. If anything, the verifiable hand-history record is an asset for detection: cleaner, tamper-evident data makes the statistics more reliable. But the conclusion from the companion note holds: the cryptography proves the deal, and behavioral analysis is the only thing that speaks to the players.