Guide

Do AI detectors work? How to think about AI-generated content

"Can this tell if something was written by AI?" is a booming search — from teachers, editors, and hiring managers. The uncomfortable answer: AI detectors are unreliable enough that you should not make consequential decisions based on them. Here's what they actually do, why they fail, and what to do instead.

How detectors claim to work

Most detectors estimate how "predictable" text is — AI-generated writing tends to be statistically smooth and average, so detectors flag low-surprise text as machine-made. It's a plausible signal in the aggregate, but it's a probability estimate dressed up as a verdict. A score of "98% AI" is not evidence; it's a guess from a model that can be wrong in both directions.

Why they get it wrong both ways

False positives are the serious problem: clear, simple, formulaic human writing — often from non-native speakers, or anyone writing plainly — gets flagged as AI, with real consequences for students and job seekers. False negatives are trivial to produce: lightly editing or paraphrasing AI text, or asking the model to write less predictably, defeats most detectors. A tool that punishes the innocent and misses the guilty is not a reliable gate.

Watermarking isn't a rescue

Proposals to invisibly watermark AI output sound promising but don't solve the real-world problem: watermarks are vendor-specific, often stripped by editing or translation, and absent entirely from open models. There is currently no robust, universal way to prove text was AI-generated. Treat any product that claims certainty with deep skepticism — the underlying science doesn't support certainty.

What to do instead

Shift from detection to process. If you care whether work is a person's own, ask about it: request drafts and version history, talk through the reasoning, or assess in conditions you control. In hiring and education, design tasks that reward understanding over output. The goal AI detectors promise — verifying authentic human effort — is better served by how you evaluate than by a probability score.

The deeper problem: it's an arms race

Detection keeps losing because it's adversarial by nature. Every improvement in detectors is met by better, more human-sounding models and trivial evasion techniques, so accuracy that looks good today degrades as the generators improve. Building a grade, a policy, or a hiring decision on a moving, beatable signal is fragile by design. The durable response is to stop trying to detect the output and instead value the process and understanding behind it — something a paraphraser can't fake.

If you must use one, use it as a hint

A detector can be one weak input among many, never the deciding factor, and never the basis for an accusation on its own. Tell people how their work is assessed, give them a way to respond to a flag, and remember that a confident-looking percentage carries real risk of being wrong about a real person. Skepticism here protects you as much as them.

The same evidence-first skepticism applies to every AI claim. Read how to read AI benchmarks without getting fooled or browse the full radar.
Advertisement