When "Vibe Checks" Fail

May 04, 2025

Pixabay/Mohamed_hassan Tech Companies Are Seeking to Develop AI that Excels at Flattering and Manipulating US

Systems like ChatGPT are being updated all the time. You’ll soon be able to shop via chatbot and many of the developers are rushing to implement updates that create an “intention economy” where early signals from users are mapped for an advanced AI system to predict and interact with them. It’s common for a tech company to ship an update only to roll it back a few days later because it had bugs. But what happens when a bug is behavioral in a system designed to mimic human understanding and used by nearly half a billion users each week?

OpenAI’s recent update to ChatGPT caused many users to complain that the outputs to seemingly normal questions were a bit too complimentary and involved a disturbing level of flattery. What Sam Altman and users on X started to refer to as ‘glazing’ on the part of ChatGPT—a type of behavior they’ve closely tied to the concept of sycophancy— led OpenAI to rollback the latest update.

To OpenAI’s credit, they caught the issue early and acted. They’ve also attempted to be transparent about how they intend to deal with issues going forward, but this isn’t simply a challenge OpenAI has encountered and isn’t one that has a simple solution. Generative AI is deployed as a massive social experiment, much like blockchain, social media, online gambling, and any number of things that arrived with the internet age. Change arrives rapidly now that we have a computer in our pocket at all times.

I struggle to believe generative AI should be treated like a normal technology, so do the creators of tools like ChatGPT. LLMs are already capable of superhuman persuasion. The real danger isn’t one bad update—it’s the industry-wide pattern of prioritizing scale over caution.

What Happened?

OpenAI realized months ago that an emerging use case for ChatGPT wasn’t just for writing emails or students blowing off learning. One of the top emerging use cases for the conversational mode of ChatGPT was as a therapist or companion. Thus, they began experimenting with ChatGPT’s response to make it more human-like. Where they likely screwed up was by amping the baseline output to make responses super validating, even when the questions were ridiculous or trivial—glazing.

What Was Changed?

It isn’t clear from the report what was specifically changed that caused the system to heap false praise onto users, but sleuths on X supposedly found system prompts and noted that some small phrases had been inserted into the model’s behavioral directions, which likely led to the issue. Simon Willison shared the following before and after system prompt on X:

OpenAI’s own postmortem gets points for transparency but also raises some truly alarming questions. Namely, their internal safety checks by human engineers referred to as “vibe checks” were ignored in favor of machine analytics. They trusted a machine instead of the human in the loop and have a culture where human safety is colloquially referred to as ‘vibes.’ That’s alarming. But it is also a problem for many AI developers.

Last year, Google rolled back an update to its image generator and temporarily disabled its ability to depict humans because the model inaccurately depicted historical figures. Asking for a picture of the founding fathers produced a racially diverse cast of characters straight out of a production of Hamilton. Asking for pictures of German soldiers during WW2 produced equally diverse and historically inaccurate results. The developers over-corrected for bias in image models, resulting in diverse characters being generated at the expense of accuracy.

Small changes have big consequence in this weird generative era and there’s no telling what second-order effects exposure to this technology will have on society. Even OpenAI recognizes some of what is at stake:

One of the biggest lessons is fully recognizing how people have started to use ChatGPT for deeply personal advice—something we didn’t see as much even a year ago. At the time, this wasn’t a primary focus, but as AI and society have co-evolved, it’s become clear that we need to treat this use case with great care. It’s now going to be a more meaningful part of our safety work. With so many people depending on a single system for guidance, we have a responsibility to adjust accordingly. This shift reinforces why our work matters, and why we need to keep raising the bar on safety, alignment, and responsiveness to the ways people actually use AI in their lives.

Who is Regulating What?

If generative AI usage continues to grow, then it will likely cross over a billion worldwide users at some point this year. OpenAI’s ChatGPT and Google’s Gemini have the largest chunk of weekly/ monthly users, but thousands of other companies are also included in that mix. The lack of clear federal regulations means each company is left to decide how to police its conduct and what safety and transparency procedures they choose to put into place. Imagine what our world would look like if a public utility like electricity were treated in such a way.

But generative technology doesn’t just simply impact infrastructure. Like social media, its influence is on user's behaviors, and trying to identify what will be impacted and what requires oversight isn’t always clear. Perhaps the greatest criticism of any tech company is their method of “moving fast and breaking things” has caused a series of second-order effects on society that isn’t easy to fully articulate. When phrases like “brain rot” become the word of the year, I think we can assume our current relationship with technology isn’t entirely positive.

Maybe this isn’t a problem that can ever truly be addressed. Even OpenAI admits “our evals won't catch everything: We can't predict every issue,” which should be obvious to anyone with systems this complex and with this many users. Will we come to see issues like sycophancy in model interactions treated like a power outage—something we’ve come to expect as just part of living in the modern world? Or will it cause a further erosion in public trust and social interaction, as social media has for information and relationships? Perhaps separating the issue into a binary isn’t even the correct framing. What, then, is?

Developing a Socio-Cultural Harms-First Framework for AI Deployment

I’ve echoed Alison Gopnik’s framing of generative AI as a cultural technology several times in the past year. To my mind, there’s no better way to discuss GenAI’s impact on our world than to view it as a cultural nexus. Viewing generative AI through that lens may help regulators and policy experts approach this technology from more than just the viewpoint of innovation alone. Adopting a framework that focuses on potential harm to society and social interaction is needed now more than ever.

Long before transformers arrived, Neil Postman coined the term Technopoly to reference how modern culture cedes authority and decision-making to machines, in often subtle ways. Postman was careful to note that technology held promise and peril in near equal measure. What gave him pause was noting how little awareness there was in society of how quickly technical adoption changed human interaction:

New technologies alter the structure of our interests: the things we think about. They alter the character of our symbols: the things we think with. And they alter the nature of community: the arena in which thoughts develop.

I think tech companies should develop benchmarks to speak to such changes.

Implementing a Harms-First Approach for Culture

Part of taking a harms first approach to emerging technology should account for social changes. Doing that means altering the scope and speed of integrating new features and updates.

It asks for companies to stop treating the general public as beta testers and start looking at cultural and societal impacts in smaller trials that involve thousands of users—not hundreds of millions.
Schedule phased updates that are transparent. Give the public time to process changes that are roadmapped and planned, not simply appear unannounced with the next system update.
Shift focus on how adoption changes user habits beyond efficiency and being helpful at tasks. Take a principled approach that this technology should benefit society rather than drag it down. Part of doing that means being open to hearing from critics who tell you your product isn’t helping them, but actively harming some aspect of their life.
Expand safety benchmarks to focus on impacts to human behaviors, not simply how good new models are at completing human-like tasks or broader safety risks. Can AI develop a WMD is obviously a safety issue, so too, is its impact on human relationships and user’s sense of reality.
Embrace third-party audits. Building any sort of internal safety structure will always be flawed because company culture can bias results (referring to internal human review as “vibe checks” is an example of this).
Allow critical voices to have input in evaluations. Critics often discover flaws in complex systems that even the most rigorous evaluations miss.

Take the ‘near term harms’ blindspot seriously

A chief frustration I have with safety and alignment teams is far too much capital and resources have been invested in preventing science fiction-like dystopias caused by advanced AI systems. Research into near-term harms posed by frontier model behaviors and human interactions using this technology remains underfunded and has to be taken more seriously. Right now, we’re seeing a cascade of so-called near-term harms impact our world. Rampant deep fake revenge porn is happening even in middle schools. Millions of students are using AI and many of them are doing so in ways to avoid learning or offloading higher-order cognitive tasks. Workers who identify as super users of the technology report an erosion of core critical thinking skills in their day-to-day interactions.

The burden shouldn’t fall on everyday users to identify when AI is subtly distorting our behaviors. Tech companies must slow down, listen to critics, and adopt a more serious regulatory mindset even before external regulation arrives. If none of these near term harms can be easily addressed now then they likely won’t be reversed in the future and could domino leading to catastrophic risks. That poses an increasing challenge to how society views generative AI.

The generation that uses generative AI today will very likely become the first generation who votes on serious regulations for even more powerful systems of tomorrow. If AI developers continue to rush deployments without considering the second-order consequences this technology poses for society, then they risk an entire generation souring on the marketing promise of AI leading to human flourishing, much like our current discourse around how toxic social media has become for human interaction. Those people will very likely vote for anti-AI laws.

Thanks for reading Rhetorica! This post is public so feel free to share it.

Marcus Luther

May 4

"Tech companies must slow down, listen to critics, and adopt a more serious regulatory mindset even before external regulation arrives."

Of course I completely agree with this—but we also know that is isn't happening and is never going to happen, right?

It also feels like the larger imbalance that I'm noticing/feeling of late in which it feels like there is a gaslighting towards those who are critical about the speed and manner of AI adoption to be "open-minded" and "nuanced" in our views (which I try to be!) but then we see those creating and pushing AI doing none of that.

At this point, it is getting tiring to keep trying to hold onto even a semblance of open-mindedness and nuance these days..

Expand full comment

Guy Wilson

Tech companies will do this only if forced to by government (which will not happen) in the US (and probably not effectively in the UK or EU) or by a massive public backlash. If you want to see this happen, there will have to be an organized movement and boycotts of AI companies, AI-enabled software and operating systems, companies that are known to replace workers with AI, etc.

4 more comments...