The battle to distinguish human writing from AI-generated text is intensifying. And, as models like OpenAI’s GPT-4, Anthropic’s Claude, and Google’s Gemini blur the line between machine and human authorship, a team of researchers has developed a new statistical framework to test and improve the “watermarking” methods used to spot machine-made text.

Their work has broad implications for media, education, and business, where detecting machine-written content is becoming increasingly important for fighting misinformation and protecting intellectual property.

“The spread of AI-generated content has sparked big concerns about trust, ownership, and authenticity online,” said Weijie Su, a professor of statistics and data science at Wharton, who co-authored the research.

Published in the Annals of Statistics, a leading journal in the field, the paper examines how often watermarking fails to catch machine-made text — known as a Type II error — and uses advanced math, called large deviation theory, to measure how likely those misses are. It then applies “minimax optimization,” a method for finding the most reliable detection strategy under worst-case conditions, to boost its accuracy.

Spotting AI-made content is a big concern for policymakers. The text is being used in journalism, marketing, and law — sometimes openly, sometimes in secret. While it can save time and effort, it also comes with risks like spreading misinformation and violating copyrights.

“The watermark has to be strong enough to detect, but subtle enough that it doesn’t change how the text reads.”— Weijie Su

Do AI Detection Tools Still Work?

Traditional AI detection tools look at writing style and patterns, but the researchers say these do not work well anymore because AI has gotten that much better at sounding like a real person.

“Today’s AI models are getting so good at mimicking human writing that traditional tools just can’t keep up,” said Qi Long, a professor of biostatistics at the University of Pennsylvania, who co-authored the research.

While the idea of embedding watermarks into the AI’s word selection process isn’t new, the study provides a rigorous way to test how well that approach works.

“Our approach comes with a theoretical guarantee — we can show, through math, how well the detection works and under what conditions it holds up,” Long added.

The researchers, who include Feng Ruan, a professor of statistics and data science at Northwestern University, suggest watermarking could play an important role in shaping how AI-generated content is governed, especially as policymakers push for clearer rules and standards.

Former U.S. president Joe Biden’s October 2023 executive order called for watermarking AI-generated content, tasking the Department of Commerce with helping to develop national standards. In response, companies like OpenAI, Google, and Meta have pledged to build watermarking systems into their models.

“Today’s AI models are getting so good at mimicking human writing that traditional tools just can’t keep up.”— Qi Long

How to Effectively Watermark AI-generated Content

The study’s authors, who include Penn postdoctoral researchers Xiang Li and Huiyuan Wang, argue that effective watermarking must be hard to remove without changing the meaning of the text, and subtle enough to avoid detection by readers.

“It’s all about balance. The watermark has to be strong enough to detect, but subtle enough that it doesn’t change how the text reads,” said Su.

Rather than tagging specific words, many methods influence how the AI selects them, building the watermark into the model’s writing style. This makes the signal more likely to survive paraphrasing or light edits.

At the same time, the watermark has to blend naturally into the AI’s usual word choices, so the output remains smooth and human-like — especially as models like GPT-4, Claude, and Gemini become increasingly difficult to tell apart from real writers.

“If the watermark changes the way the AI writes — even just a little — it defeats the point,” Su said. “It has to feel completely natural to the reader, no matter how advanced the model is.”

The study helps address this challenge by offering a clearer, more rigorous way to evaluate how well watermarking performs — an important step toward improving detection as AI-generated content becomes harder to spot.