Beyond Ineffective: How Unreliable AI Detection Actively Harms Students
A free section from my course about why we shouldn't be using unreliable AI detectors in education
I started this blog with a plea to shift away from surveilling students and not adopting AI detection. With the start of the fall semester, many universities and schools that rushed to adopt AI detectors or ban ChatGPT have since done an about-face or turned them off because they don’t reliably work and likely never will.
It’s my hope this fall puts an end to so-called AI detectors in education, but the underlying desire to monitor students through surveillance isn’t going anywhere. We’re sure to see calls for increasingly intrusive measures and should resist these.
Below are just a few of the most recent updates:
From OpenAI’s How can educators respond to students presenting AI-generated content as their own?
Do AI detectors work?
In short, no. While some (including OpenAI) have released tools that purport to detect AI-generated content, none of these have proven to reliably distinguish between AI-generated and human-generated content.
Additionally, ChatGPT has no “knowledge” of what content could be AI-generated. It will sometimes make up responses to questions like “did you write this [essay]?” or “could this have been written by AI?” These responses are random and have no basis in fact.
From the NYTs Despite Cheating Fears, Schools Repeal ChatGPT Bans
As schools reopen for fall, educators and district leaders are wrestling with complex questions posed by the A.I. tools: What should writing assignments look like in an era when students can simply employ chatbots to generate prose for them? How can schools, teachers and students use the bots effectively and creatively? Does it still count as cheating if a student asks a bot to fabricate a rough draft that they then rewrite themselves?
Vanderbilt University shuts down their AI detector within Turnitin because of too many false positives:
To put that into context, Vanderbilt submitted 75,000 papers to Turnitin in 2022. If this AI detection tool was available then, around 3,000 student papers would have been incorrectly labeled as having some of it written by AI. Instances of false accusations of AI usage being leveled against students at other universities have been widely reported over the past few months, including multiple instances that involved Turnitin (Fowler, 2023; Klee, 2023). In addition to the false positive issue, AI detectors have been found to be more likely to label text written by non-native English speakers as AI-written (Sample, 2023)
The University of Pittsburg also shut off the AI detector within Turnitin:
Based on their professional judgment, the Teaching Center has concluded that “current AI detection software is not yet reliable enough to be deployed without a substantial risk of false positives and the consequential issues such accusations imply for both students and faculty. Use of the detection tool at this time is simply not supported by the data and does not represent a teaching practice that we can endorse or support.”
The recently published Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text
While this study indicates that AI-detection tools can distinguish between human and AI-generated content to a certain extent, their performance is inconsistent and varies depending on the sophistication of the AI model used to generate the content. This inconsistency raises concerns about the reliability of these tools, especially in high-stakes contexts such as academic integrity investigations. Therefore, while AI-detection tools may serve as a helpful aid in identifying AI-generated content, they should not be used as the sole determinant in academic integrity cases.
Education panicked over ChatGPT and rushed to adopt a technological solution to deal with generated content instead of spending the resources and time to develop alternative, critical approaches to generative AI in education. I recently wrote a post about what we’ve been trying here at the University of Mississippi to help train faculty for WCET Frontiers: The Promise and Challenges of AI in Higher Ed.
I also did an interview for the Observer about how generative AI forces educators to reckon with some core questions about what it means to learn or why students pay money to go to college.
Watkins concurred, “We’ve been so focused for the past almost 50 years on telling everyone ‘you have to have to have a college degree and everything else to have a life’ that we’ve lost the fact that part of the reason you go to college isn’t to get to that degree, but to teach yourself to ask questions about the world. And I think generative A.I. will maybe drive a much deeper conversation about what that question is.”
I’m including a free module from my online course about the challenges AI detectors pose to education.
AI Detectors a Digital Arms Race
Video Chapters
00:00 AI Detection Introduction
00:57 How We Teach Works
01:40 AI Detectors
04:25 Unreliable
05:40 Bias
06:45 False Positives
07:44 OpenAI's Text Classifier
08:58 FTC Waring
10:15 TurnItIn AI Detector
11:17 Surveillance
12:58 Bluebooks
16:05 Trust Matters
18:58 Ethical Principles for AI Detection
20:30 Principles Continued
Pause Before Uploading Student Work
When we upload work for a plagiarism check, we can do so because students have granted us an extremely limited license to check their work in the context of academic honesty. Students still hold the copyright to their work, and we need to be respectful of students’ data rights. Uploading student work to third-party websites always needs clear consent.
Institutions have had no time to vet any of these so-called AI detectors. This is unlike plagiarism software vendors, which are contracted, go through testing, have legally binding user agreements, and face legal consequences based on how they handle student data. Even that isn’t full proof to guard against unethical behavior.
If we upload student writing to one of these detectors, we have no idea what happens to students’ data. Will these companies store student data? Will they sell student data? Will they use student data as training material for future LLMs? As far as I’m aware, there has been no discussion about whether any of these AI detection systems is FERPA compliant.
None of these systems have been rigorously tested. GPTzero was made by a grad student, in a coffee shop, during the winter break. The developer just released an API for educators and institutions to use. Do we think uploading student writing to an app that was built in such haste is a reasonable move on our part?
As far as I can tell, all current AI detection software uses older versions of LLMs as their main detection mechanism. If educators use one, we are choosing to embrace AI to try and catch AI. It also isn’t possible to vet such tech fully because these are black-box systems. We don’t know how they were trained, what they were trained on, and how bias functions within them.
Rundown of Current AI Detection Systems
Activity: Evaluate Real or Fake Text
You can play the Real or Fake Text game and use the following activity with fellow teachers or your students:
Further Resources
Most sites claiming to catch AI-written text fail spectacularly
We tested a new ChatGPT-detector for teachers. It flagged an innocent student.
The Use of AI-Detection Tools in the Assessment of Student Work
Found this Helpful? Sign up for My Professional Development Course!
If you found the above section helpful, consider joining my course. Generative AI in Education has over twenty sections and I’ve released assignments from the course under free CC-By licensing and offer discounted group pricing, scholarships, and OER Fellowships for access to the course.
Hi good morning my dear friends my Tyrone crossman I am coming soon okay.