By Jackie Davalos and Leon Yin

The best AI writing detectors are highly accurate, but they’re not foolproof. Businessweek tested two of the leading services—GPTZero and Copyleaks—on a random sample of 500 college application essays submitted to Texas A&M University in the summer of 2022, shortly before the release of ChatGPT, effectively guaranteeing they weren’t AI-generated. The essays were obtained through a public records request, meaning they weren’t part of the datasets on which AI tools are trained. Businessweek found the services falsely flagged 1% to 2% of the essays as likely written by AI, in some cases claiming to have near 100% certainty.

Even such a small error rate can quickly add up, given the vast number of student assignments each year, with potentially devastating consequences for students who are falsely flagged. As with more traditional cheating and plagiarism accusations, students using AI to do their homework are having to redo assignments and facing failing grades and probation.

The students most susceptible to inaccurate accusations are likely those who write in a more generic manner, either because they’re neurodivergent like Olmsted, speak English as a second language (ESL) or simply learned to use more straightforward vocabulary and a mechanical style, according to students, academics and AI developers. A 2023 study by Stanford University researchers found that AI detectors were “near-perfect” when checking essays written by US-born eighth grade students, yet they flagged more than half of the essays written by nonnative English students as AI-generated. OpenAI recently said it has refrained from releasing an AI writing detection tool in part over concerns it could negatively affect certain groups, including ESL students.