The Pangram Chrome Extension lists many of my essays as 100% human (all that I tested) even though I use and disclose AI assistance. Never trust its output! The AI detection companies are gaslighting people with accuracy claims that fall apart as soon as you have a closer look.
That's a False Negative, which are part of the system by design. The system deliberately takes the "beyond a reasonable doubt" approach in the precision/recall tradeoff.
So, we are now trusting an AI system “beyond a reasonable doubt” because we do not like how people are using another AI system? “Beyond a reasonable doubt” a mathematical impossibility in AI.
People really really really want a silver bullet for dismissing AI, and companies will produce what people want. Looks like--yet again--there is no substitution for careful consideration.
This reminds me of The Simpsons episode 'Krusty the Clown' which I just watched with my son. Homer becomes a famous TV recapper by assigning letter grades to shows and even though there is other text written about the show, essentially the only thing anyone cares about is the grade. All nuance and insight are flattened into an ontological nullity.
As I was reading your post, I kept thinking, What if Pangram didn't use red, yellow, and green colors for its labels and instead used something more neutral? Then I got to this paragraph in your post:
"Would you still buy a house with a listing that has an AI label next to it? Trust product reviews that were labeled as being written by human beings, even if they may have been written as part of a paid promotion? Find a different doctor or dentist because the text on their homepage had an 'AI-assisted' label next to it?"
The red, yellow, and green labels assume some kind of value system, but without any attention to the actual context of the text. And that context matters! If Pangram were to shift to some other color scheme, that would do a bit of what Deep Background does, namely, inviting the reader to explore that context more, not less.
Yeah, that's be ideal for the type of deep and meaningful engagement we'd like to see in terms of vetting and engaging with content. Sadly, I don't think they're interested in doing that because it isn't efficient.
I love the way you use it with an open mind for awhile but come back to the impact on you because the most important part of any evaluation of any tool is not what it does, but what it does to the tool user.
This is exactly why we need AI literacy. Use AI to ask better questions, evaluate information, and scrutinize the output. Don’t just let a tool decide if it’s real or not. The bigger problem is that these tools create a weird psychological shift where people trust the label, then disengage from anything labeled AI, and assume human equals better or more trustworthy. We all know that humans can be wrong, manipulate, and write absolute garbage. At the same time, AI can be useful, accurate, and help explain things. The label takes out all of the nuance.
100% agreed - normalize AI disclosure. The challenge is in the norming, as it has been for discourse since...discourse began? But when discourse was face to face, conversation in-person included and still includes so many small gestures of hearing, clarifying, pausing, attentiveness or inattentiveness. Disembodied language has been involved in norming battles since the 90's. All this to state, while I agree with your stance and appreciate your whole post, I also see this is as a most difficult battle. Makes me grateful for my in-person classroom dynamic.
Really enjoyed this, and appreciated you taking the time to avoid the usual hot takes in this space. And, if you don't mind, I'd like to complicate things a little.
Rie Kudan used ChatGPT to write roughly 5% of 'Sympathy Tower Tokyo'—not to save time, but because she needed a character voice she couldn't reach by trying: polished, affectively smoothed, fluent without friction. Her judges called it practically flawless without knowing how it was made. The 5% isn't separable from the rest of the text in any meaningful way that a classifier could find, because the writing required the model precisely because she couldn't reach that register alone.
Which suggests Pangram isn't just inaccurate: it's asking the wrong question. The human / AI boundary it polices doesn't correspond to anything real about how knowledge is made—which puts it in familiar territory.
I'm reminded of Leviathan and the Air-Pump, which showed that The Royal Society excluded Hobbes and the craftspeople who made Boyle's air pumps actually work, drawing a line that looked epistemic but was doing social work. Boyle's experiments didn't replicate because replication followed the skilled hands, not the manuscripts.
Marc, your piece is both brave and deeply revealing.
The moment you describe — when you started running Substack posts, tweets, and LinkedIn articles through Pangram and suddenly began to distrust even good writing — is exactly the psychological poison I’ve been warning about. This is how the architecture of accusation quietly destroys the foundation of intellectual life: trust.
When a probabilistic tool becomes the default lens through which we read each other, we don’t just question authorship. We begin to question competence itself. Clear style becomes “suspiciously structured.” Strong argumentation becomes “too polished.” That is the real damage.
I explored this mechanism in detail in my recent essay “Probably. How a single hedging word became the most powerful tool of institutional accusation in modern intellectual life — and what it is actually built to do," where I argue that we are no longer fighting misuse of AI — we are watching fear of AI being institutionalized into a new system of control:
The Pangram Chrome Extension lists many of my essays as 100% human (all that I tested) even though I use and disclose AI assistance. Never trust its output! The AI detection companies are gaslighting people with accuracy claims that fall apart as soon as you have a closer look.
That's a False Negative, which are part of the system by design. The system deliberately takes the "beyond a reasonable doubt" approach in the precision/recall tradeoff.
So, we are now trusting an AI system “beyond a reasonable doubt” because we do not like how people are using another AI system? “Beyond a reasonable doubt” a mathematical impossibility in AI.
In a pure fluke, I read this right after listening to the interview on the Atlantic with the founder of Pangram.
Thank you for clearly articulating 90% of my skin crawl moments from that discussion.
Automation to solve the problems of automation, do not remove the problems of automation.
People really really really want a silver bullet for dismissing AI, and companies will produce what people want. Looks like--yet again--there is no substitution for careful consideration.
This has been my worry with programs like this.
No idea what it would say about my posts…
This reminds me of The Simpsons episode 'Krusty the Clown' which I just watched with my son. Homer becomes a famous TV recapper by assigning letter grades to shows and even though there is other text written about the show, essentially the only thing anyone cares about is the grade. All nuance and insight are flattened into an ontological nullity.
https://www.youtube.com/watch?v=DyEcKozvN-U
As I was reading your post, I kept thinking, What if Pangram didn't use red, yellow, and green colors for its labels and instead used something more neutral? Then I got to this paragraph in your post:
"Would you still buy a house with a listing that has an AI label next to it? Trust product reviews that were labeled as being written by human beings, even if they may have been written as part of a paid promotion? Find a different doctor or dentist because the text on their homepage had an 'AI-assisted' label next to it?"
The red, yellow, and green labels assume some kind of value system, but without any attention to the actual context of the text. And that context matters! If Pangram were to shift to some other color scheme, that would do a bit of what Deep Background does, namely, inviting the reader to explore that context more, not less.
Yeah, that's be ideal for the type of deep and meaningful engagement we'd like to see in terms of vetting and engaging with content. Sadly, I don't think they're interested in doing that because it isn't efficient.
I hate AI, and I love that I can provide such a slam-dunk discrediting example for this.
I love the way you use it with an open mind for awhile but come back to the impact on you because the most important part of any evaluation of any tool is not what it does, but what it does to the tool user.
This is exactly why we need AI literacy. Use AI to ask better questions, evaluate information, and scrutinize the output. Don’t just let a tool decide if it’s real or not. The bigger problem is that these tools create a weird psychological shift where people trust the label, then disengage from anything labeled AI, and assume human equals better or more trustworthy. We all know that humans can be wrong, manipulate, and write absolute garbage. At the same time, AI can be useful, accurate, and help explain things. The label takes out all of the nuance.
100% agreed - normalize AI disclosure. The challenge is in the norming, as it has been for discourse since...discourse began? But when discourse was face to face, conversation in-person included and still includes so many small gestures of hearing, clarifying, pausing, attentiveness or inattentiveness. Disembodied language has been involved in norming battles since the 90's. All this to state, while I agree with your stance and appreciate your whole post, I also see this is as a most difficult battle. Makes me grateful for my in-person classroom dynamic.
Really enjoyed this, and appreciated you taking the time to avoid the usual hot takes in this space. And, if you don't mind, I'd like to complicate things a little.
Rie Kudan used ChatGPT to write roughly 5% of 'Sympathy Tower Tokyo'—not to save time, but because she needed a character voice she couldn't reach by trying: polished, affectively smoothed, fluent without friction. Her judges called it practically flawless without knowing how it was made. The 5% isn't separable from the rest of the text in any meaningful way that a classifier could find, because the writing required the model precisely because she couldn't reach that register alone.
Which suggests Pangram isn't just inaccurate: it's asking the wrong question. The human / AI boundary it polices doesn't correspond to anything real about how knowledge is made—which puts it in familiar territory.
I'm reminded of Leviathan and the Air-Pump, which showed that The Royal Society excluded Hobbes and the craftspeople who made Boyle's air pumps actually work, drawing a line that looked epistemic but was doing social work. Boyle's experiments didn't replicate because replication followed the skilled hands, not the manuscripts.
My only point is that False Negatives are by design. False Positives are where the concern lies.
Marc, your piece is both brave and deeply revealing.
The moment you describe — when you started running Substack posts, tweets, and LinkedIn articles through Pangram and suddenly began to distrust even good writing — is exactly the psychological poison I’ve been warning about. This is how the architecture of accusation quietly destroys the foundation of intellectual life: trust.
When a probabilistic tool becomes the default lens through which we read each other, we don’t just question authorship. We begin to question competence itself. Clear style becomes “suspiciously structured.” Strong argumentation becomes “too polished.” That is the real damage.
I explored this mechanism in detail in my recent essay “Probably. How a single hedging word became the most powerful tool of institutional accusation in modern intellectual life — and what it is actually built to do," where I argue that we are no longer fighting misuse of AI — we are watching fear of AI being institutionalized into a new system of control:
https://olegmaltsev.substack.com/p/probably-the-most-dangerous-word-in-intellectual-life
Your personal experiment makes the abstract danger concrete. Thank you for writing it.
Reading while revising a manuscript...it’s one more reason I don’t trust AI detectors as judges of either learning or authenticity.
Thank you for this very helpful post.
Did you try typing out a section of a book (one you know wasn’t AI) and have it analyze it?