Every tell on this site is measured, not vibed. The short version:
We generate text from current models (and their 2024/25 predecessors) on prompts matched to pre-2022 human writing — same topics, same genre, same seeds. For example: real 2019 Hacker News story titles, where we have the comments humans actually wrote; or cooking questions where we have the answers humans actually gave. What models overuse relative to matched humans is a candidate tell.
Every candidate is then checked against a canon of acclaimed pre-AI prose — a curated set of well-regarded writers, all pre-2022. (We keep the exact list private, along with our prompt battery, so the measurements stay hard to game.) A pattern that also fires on great human writers gets demoted or shipped with a warning. This is why we won't tell you the em dash means AI — it doesn't.
The statistics (n-gram keyness, wildcarded construct-frames, formatting rates, sentence-cadence metrics) know nothing about what people say the tells are. Reddit threads and press lists are used only as an answer key afterwards — and as the “attested” evidence axis. When measurement contradicts folk wisdom, we publish the correction (see the em dash, staccato sentences).
Tells are dated and versioned: emerging → active → saturated → fading → retired. Comparing current models against their predecessors on identical prompts shows which tells are being trained away (“Great question!”, the 2024 hedging cluster) and which are rising. The list you're reading will be wrong in a year — that's the point, and we'll keep measuring.
Deterministic pattern matching against the tells database — your text is scored in the request and never stored, and no LLM ever sees it. One match means almost nothing; density and co-occurrence are the signal. It's a smell test, not a verdict.
Questions about the method? We're happy to talk — but the prompt battery, corpora, and canon roster stay private. A measurement instrument you can read is a measurement instrument you can game.