Home / Detection Signals

Behavioural Signals for the Detection of Automated Poker Play

Summary: Because cryptography can't identify a bot, detection is a behavioral problem. The strongest signals are statistical, not visual: the shape of a player's action-timing distribution, the entropy of their bet sizing, and multi-tabling patterns that exceed human capacity. No single signal is proof; detection systems combine many weak signals into a confidence score, then route the high-confidence cases to human review.

Empirical detection of automated play does not proceed hand by hand. It proceeds at the level of distributions: a population of seats is classified against a reference population of human play, and divergence is accumulated over many observations. The present paper enumerates the three signal families that contribute the dominant share of classifier weight in operator-side detection pipelines.

Signal 1 — Action-timing distributions

The single most reliable behavioral signal is time-to-act: how long a player takes to make each decision. Humans produce a broad, right-skewed distribution — fast on trivial spots, slow on hard ones, with natural noise from attention, fatigue, and tabbing between tables. A naive bot, by contrast, acts on a tight internal clock and produces a sharp spike.

The giveaway is not the average — a bot can be tuned to a human-looking mean. It's the shape. A human histogram is wide and lumpy; a scripted one is narrow, or shows artificial regularity like quantization to fixed intervals. Plot enough decisions and the two separate cleanly:

Histogram comparing a broad human time-to-act distribution against a sharp scripted spike — Same decision spot, 200 samples. The human distribution is wide and right-skewed; the scripted one collapses into a spike.

Sophisticated bots fight back by adding randomized delays drawn from a human-like distribution. That raises the bar but rarely closes the gap, because human timing is context-dependent — correlated with hand difficulty, stack depth, and prior action. Reproducing the marginal distribution is easy; reproducing the conditional structure (slow exactly when the spot is hard) is much harder, and mismatches there are themselves a signal.

A few features that fall out of timing analysis specifically:

Variance vs. difficulty correlation. Humans slow down on close decisions and speed up on obvious ones. A flat response time across spots of wildly different difficulty is unnatural.
Quantization. Delays that snap to round intervals (every action at a multiple of 200ms) betray a discrete internal scheduler.
Pre-action leakage. Acting before it is plausible that a human could have read the board — sub-perceptual reaction times on complex spots.
Absence of drift. Human timing wanders over a session as attention fades; a steady distribution across six hours is a flag.

Signal 2 — Bet-sizing entropy

Solvers output precise, repeatable sizings — 33% pot, 67% pot, 125% pot — and they hit them to the chip across thousands of hands. Humans are messier: they round to convenient numbers, drift with mood and stack size, and rarely reproduce the same exact sizing in the same spot every time.

Measuring the entropy of a player's sizing choices captures this. Two failure modes both look suspicious:

Entropy too low. The player snaps to a tiny set of exact fractions with machine precision — a solver fingerprint.
Strategy too clean. Sizings track GTO frequencies far more tightly than any human studies into reflex, especially across many simultaneous tables.

On a provably-fair site the recorded hand histories are tamper-evident, which makes this kind of long-run statistical aggregation more trustworthy — the data you're measuring is verifiably real, as covered in the note on provable fairness.

Signal 3 — Multi-tabling and session patterns

Automation's economic point is scale, and scale leaves prints:

Pattern	Why it flags	Strength
Table count beyond human capacity	Consistent A-game across 16+ tables exceeds attainable human attention	High
Marathon sessions, no fatigue drift	Decision quality and timing don't degrade over many hours	Medium
Hardware / client fingerprint reuse	Many “players” share one machine signature	High
Headless or instrumented client artifacts	Automation hooks visible at the client layer	High
Synchronized action across seats	Several accounts act on a shared clock	Medium

None of these is conclusive alone — a strong human grinder can multi-table heavily, and shared hardware can mean a household. That is exactly why they feed a combined score rather than triggering action directly.

Combining weak signals

The architecture detection teams converge on looks like this:

per-player features
    timing-shape divergence
    sizing entropy
    multi-table / fatigue metrics
    client + network fingerprints
      → weighted anomaly score
      → threshold → queue for human review
      → manual confirmation → action

The design principle is that any single feature is noisy and beatable, but they are hard to beat simultaneously. A bot tuned to pass the timing test may give itself away on sizing entropy; one that randomizes both may still trip the multi-tabling or fingerprint checks. Crucially, the final step is human — automated scoring decides where to look, not who gets penalized, which keeps false positives from hitting legitimate grinders.

The false-positive problem

The hard constraint on any detection system is not catching bots — it is not punishing humans. A skilled, study-heavy regular who multi-tables and plays close to GTO will, by construction, look somewhat bot-like: tight sizing, fast accurate decisions, long sessions. Treating “plays well” as “plays automated” would gut the player base of exactly the customers a room wants. This is why mature systems weight mechanical impossibility signals — sub-human reaction times, shared client fingerprints, perfect cross-table synchronization — far above skill signals like good sizing. You can be a brilliant human; you cannot be in two places acting on the same millisecond.

The arms race

Detection and evasion co-evolve. Each defensive signal invites a countermeasure, which invites a meta-signal:

Evasion	Counter-signal
Randomized human-like delays	Check conditional timing — slow on hard spots, not just on average
Deliberate sizing “mistakes”	Errors that never cost EV are themselves a tell
Rotating accounts and IPs	Hardware fingerprint and play-style clustering across accounts
Capping table count to look human	Lower volume per account, but identical style across the cluster

The asymmetry favors the defender in the long run: the bot must look human on every axis at once and over thousands of hands, while the detector only needs one durable inconsistency. The more an operator tunes a bot to defeat one signal, the more it tends to distort another. Perfect mimicry across the full feature set, sustained over a large sample, is the actual bar — and it is a high one.

Feature engineering, in practice

For anyone building one of these systems, the work is less about exotic models and more about engineering features that are robust to mimicry. A few principles that tend to hold:

Prefer conditional features over marginal ones. “Average decision time” is trivially spoofable. “Decision time as a function of spot difficulty” is much harder to fake because it requires the bot to model difficulty the way a human experiences it.
Aggregate over long windows. Single hands are noise. The signal lives in distributions over hundreds or thousands of decisions, where a human's natural variance and a script's artificial regularity diverge.
Cross-reference accounts. Many evasion strategies survive at the single-account level but collapse when you cluster by play style, timing fingerprint, and hardware across the whole population.
Keep the human in the loop. The model's job is triage — surface the most anomalous cases for review. Treating a score as an automatic verdict is how you ban your best customers.

The output is rarely a clean “bot / not bot” label. It is a ranked queue of accounts whose behavior is statistically improbable for a human, ordered by confidence, with the impossible-for-a-human signals (sub-perceptual reaction times, shared fingerprints, synchronized action) weighted heaviest because they are the hardest to explain away innocently.

What detection cannot do

It is honest to name the limits. Behavioral detection is statistical, so it is probabilistic — it produces confidence, not certainty, and a sufficiently restrained bot (low volume, heavily randomized, single account, clean hardware) can stay under the threshold for a long time. The trade-off the bot operator faces is that every step taken to evade detection — fewer tables, more randomness, deliberate suboptimality — directly erodes the economic edge that motivated automation in the first place. Detection doesn't need to be perfect; it needs to make undetectable botting unprofitable. That is a softer but more achievable goal, and it is the one well-run systems actually optimize for.

What this means for crypto poker specifically

The detection problem on a provably-fair site is, mechanically, the same behavioral problem as on any poker room — the on-chain layer changes the trust in the data, not the method of the verdict. If anything, the verifiable hand-history record is an asset for detection: cleaner, tamper-evident data makes the statistics more reliable. But the conclusion from the companion note holds: the cryptography proves the deal, and behavioral analysis is the only thing that speaks to the players.

Raul Moriarty

Poker Software Expert & Communications Lead at Poker Bot AI

Writes about poker automation, game-integrity systems, and how detection actually works in practice — separating the cryptography from the behavioral side.