What is perplexity in AI detection?

Perplexity measures how predictable a text is under a language model. AI-generated text tends to have low perplexity because it was produced by the same type of probabilistic process the detector measures.

What is burstiness and why does it matter?

Burstiness measures the variance in sentence-level complexity. Human writers naturally alternate between flowing, easy passages and dense, complex ones. AI text tends to have unnaturally uniform complexity.

Why do AI detectors give false positives?

False positives happen when human-written text shares statistical properties with AI output. Technical writing, STEM academic prose, and text from non-native English speakers naturally have low perplexity.

Can AI humanizer tools defeat detectors?

Yes. In our testing, 14 humanizer tools achieved bypass rates of 23–91% against 6 major detectors.

How AI Detector Works — Perplexity, Burstiness & Signals

The Problem Detectors Are Solving

AI language models generate text by predicting the most statistically likely next word. This produces text that is, by design, statistically predictable. Human writers regularly make unexpected choices that deviate from the statistical average.

Signal 1: Perplexity

PP(text) = exp( -(1/n) × ∑ log P(tᵢ | t₁...tᵢ₋₁) )

Low perplexity means the model found the text predictable. AI text tends toward low perplexity because it was produced by the same kind of probability distribution the detector measures. Limitation: Technical writing and STEM academic prose naturally have low perplexity, producing false positives.

Signal 2: Burstiness

burstiness = σ(sentence perplexities) / μ(sentence perplexities)

Human writers are bursty — alternating between flowing and labored passages. AI text has unnaturally uniform sentence-level perplexity. GPTZero popularized burstiness as a detection signal in 2023.

Signal 3: Vocabulary & N-gram Analysis

AI text exhibits lower lexical diversity, fewer hapax legomena, and elevated frequency of transition phrases: "furthermore," "it is worth noting," "in conclusion."

False Positives: The Critical Risk

False positive rates in our benchmark ranged from 7% (Originality.ai) to 17% (Sapling AI). Non-native English speakers and STEM writers are disproportionately affected.

Critical Caveat

AI detection results should never be used as the sole basis for an academic integrity accusation. False positive rates of 7–17% are too high for definitive conclusions.

Can AI Detection Be Bypassed?

Yes. In our 2026 bypass study testing 14 humanizer tools against 6 detectors, bypass rates ranged from 23% to 91%. Average accuracy dropped 31 percentage points on humanized text. See full accuracy data.

The most bypass-resistant approaches combine statistical detection with provenance-based methods. Cryptographic watermarking like SynthID and C2PA content credentials operate at the generation level — not in surface text features.

Signal 4: Model Fingerprinting

Advanced detectors maintain model-specific classifiers. GPT-4o has characteristic paragraph structures and transition phrase frequencies. Claude 3.5 produces more variable, hedged outputs. Originality.ai appears to maintain per-model classifiers — explaining its ability to identify which model generated a specific text.

What the Best Detectors Get Right

The tools scoring highest in our benchmark combine multiple signals rather than relying on perplexity alone, incorporate domain-specific calibration, and provide granular signals rather than binary verdicts. See our accuracy benchmark and comparison table.

How AI Detector Works