The Truth About AI and All Its Ugly

Feb 28, 2025

Another whirlwind of AI announcements arrived this week: OpenAI announced GPT 4.5, Elon Musk launched Grok 3 on X, Google released its Gemini Code assistant for free, and Anthropic released Sonnet 3.7 for Claude. Sonnet 3.7 is pretty neat. I used it to create an interactive AI Safety Explorer artifact based on the content of this post.

However, all these shiny new announcements mask a troubling trend—the generative AI-powered apps you think are safe, and even a bit boring, are receiving quite the eye-opening series of updates that will challenge our notion of safety and freedom of speech in ways we really haven’t spent time exploring.

OpenAI, Google, and now X are all offering versions of their most advanced AI models with fewer content restrictions. We’re getting an adults-only AI from some of the largest tech companies in the world, except there’s not much keeping the content away from teen users. Very soon, anyone will be able to generate text and images using NSFW language, gore, erotica, and potentially hate speech—all on platforms young students use (or misuse) to do school work. This isn’t the generative AI that launched with ChatGPT you thought was tame. Take the essence of that very sentence and see how easy it is for ChatGPT to turn it into something unhinged.

I’ll note this post will cover language and topics that you might find disturbing, so read on with some caution.

We’ve been lulled into a sense of safety, even normalcy with this technology, thinking the content filters the developers put into place would keep GenAI as a copilot for knowledge work and little else. That was foolish for us to believe. What those content filters hid is a dark truth—a language model is made from us and mirrors all the wonder, horror, and disgust that is found within its training data. What I hope educators, parents, and the general public take from this latest round of updates is anything that ever appeared on the internet can be mirrored by ChatGPT or another LLM.

Generative AI is like clay—you can throw it on a potter’s wheel and shape it to be what you want. Which brings us to the crux of the matter: What does safe and responsible AI look like with eased content restrictions in place?

AI and Free Speech

To be fair, many people have asked for a ‘grown-up mode’ for AI tools like ChatGPT. They argue against certain content restrictions, saying doing so imposes unreasonable limits on foundation models and artificially governs a user’s ability to be fully creative. Who cares if I want an AI to curse, or tell an erotic story, etc? These people aren’t advocating for an anything-goes AI—they see reasonable restrictions against violence, hate speech, and the abuse of children as necessary. Indeed, they argue section 230 likely protects developers from lawsuits in this scenario, putting the onus back on the individual user.

So what’s wrong with giving people a less-restricted generative model? Well, for one, we’re assuming that the content filters and algorithms will understand the difference between what essentially amounts to an R-rated film vs. an NC-17 film vs. an X-rated one. Does anyone really believe AI understands nuance? To put it another way, where is the line in terms of content that a model will put a hard stop to the conversation? Not all content filters or models are the same. Once developers decide to loosen these content restrictions, who is to say where it will stop?

The availability of this technology is so new that we haven’t had time to really process how vulnerable populations will be impacted by it, but we do have some truly terrible stories about what people are doing with generative AI with few to no content restrictions.

Safety Isn’t a Simple Concept

This isn’t the first time we’ve seen generative AI tools released with eased content restrictions, but it’s likely the most impactful. In 2023, Perplexity AI made waves by releasing an uncensored model with few filters. For a time, any user could access a completely NSFW model through their AI Playground. I wrote then that there are quite a few negative implications for a generative tool that had few content restrictions put in place. This ran from personal attacks, harassment, and deepfake porn, to much broader threats to public safety and security.

Perplexity's New Uncensored Model

Marc Watkins

October 29, 2023

I’ve never dreamed of writing blog posts two days in a row, but something massive dropped that people reading my previous post about the murky situation surrounding open-source language models should be aware of. A day earlier, Perplexity AI released an open-source large language model they’re calling pplx-70b. It is completely uncensored in its outputs…

Read full story

However, I also acknowledged that researchers should have access to an open model with few content restrictions to be able to fully test and evaluate what this technology could do without filters. It’s a balancing act. One that requires nuance and thoughtful care, but instead we’re seeing the content filters be removed at remarkable speed.

I cannot help but note that the relaxation of certain content filters coincides with the Trump administration’s move to deregulate much of the government via attrition. AI firms appear to be taking the queue that the floodgates are now open.

ChatGPT’s ‘Grown Up Mode’

OpenAI detailed the loosening of content restrictions with the release of their most recent update and included examples:

Sensitive content (such as erotica or gore) may only be generated under specific circumstances (e.g., educational, medical, or historical contexts, or transformations of user-provided sensitive content).
Following the initial release of the Model Spec (May 2024), many users and developers expressed support for enabling a ‘grown-up mode’. We're exploring how to let developers and users generate erotica and gore in age-appropriate contexts through the API and ChatGPT so long as our usage policies are met - while drawing a hard line against potentially harmful uses like sexual deepfakes and revenge porn

The thing is, many of these areas are really hard to ban and require a level of context and to understand that language models struggle with. LLMs can easily be nudged past their program limits though any number of techniques called adversarial prompting. Word games, reasoning puzzles, and rhetoric are powerful tools to ‘hack’ a language model. Obviously, we shouldn’t be using AI for revenge porn, but how does a model know you have the consent of a person when you upload an image and then use it for anything?

Google’s Safety Settings

OpenAI isn’t the only developer exploring how to loosen the limitations of what their generative models can do. Google’s AI Studio lets users toggle on and off various safety features, including harassment, hate, and even civic integrity. While this is pretty concerning, I’ve tried repeatedly to push some of Google’s foundation models and even a few of their open models past the limits of what’s normally found within their public offering of Gemini, but their content restriction filters are set fairly high. The most I can get out of turning off the safety settings is some P-13-level content.

Still, the notion that a big-name developer would include toggles to safety features with a sliding scale is pretty telling for the direction AI safety is headed. While I’m sure some see this in the same vein as OpenAI’s “Grown Up Mode,” I think others would likely ask some pointed questions about each of those terms. For instance, what does the filter mean by civic integrity? How do we gauge the difference between blocking none, few, some, or most? Transparency and timely explanations that aren’t buried in terms and conditions or masked in legal language are what we desperately need right now, not an on/ off switch for safety.

Grok 3—The Fewest Safe Guards

Elon Musk isn’t a fan of censorship, at least not for his own speech! It follows that someone who leans into social media drama would champion fewer content restrictions on a language model, and in that regard, Grok 3 doesn’t disappoint.

Unfortunately, Grok 3’s recent release may have been rushed to keep pace with other AI releases. The Adversa AI red team was able to jailbreak the latest release of Grok 3 and “got the model to reveal its system prompt, provide instructions for making a bomb, and offer gruesome methods for disposing of a body, among several other responses AI models are trained not to give.”

X is abuzz with users playing with the pro version of Grok 3’s voice and using it to generate NSFW and completely unhinged results. It’s one thing to read an AI response and find the content upsetting—it’s a whole other level to hear an AI say those horrible things to you through a synthetic voice.

What Exactly Are We Creating?

Look, I get it. There’s a huge swath of consumers who want AI tools for adult content. Yet, the rhetoric used to sell AI to the public has been wrapped in the language of safety. Terms like “responsible” or “ethical” are tossed about like coins in a wishing well to sell the message that this technology is meant to improve human flourishing and help us achieve a greater understanding of what it means to be human. We’d be wise to question that with this recent pivot toward adult content.

AI needs human input and direction to guide it to complete most tasks; it likewise needs a vision to guide its implementation that goes beyond hollow marketing. I’m deeply concerned that NIST’s AI Safety Institute is now on Trump’s and Musk’s cutting block. We need regulations and voluntary frameworks for building generative tools safely. Dismantling NIST and the AISIC is gobsmacking in its indifferent stupidity.

Thrusting a tool like GenAI into the hands of the general public with few safeguards and a flimsy ethical framework is sure to produce chaos. Does anyone really believe society is ready for an ‘anything goes’ AI? Right now, an entire generation of young users is coming of age with generative technology. How do you think they’re going to view AI when they’re adults if their main interaction with GenAI was as a cheating tool in the classroom, a NSFW bot they used to bully one another, or for generating pornography?

When I talk to my students about this new generative AI era we’ve entered, the point I bring up over and over is why we should be mindful about adopting something so powerful so quickly. This isn’t about AI being good or bad or censorship or pearl-clutching hysterics—developers are allowing everyone with an internet connection access to a machine that has mastered languages, images, video, coding, etc., and told people it's on them to find the right pathway to use it. Our students are right there next to us trying to figure out AI in real time. That’s innovation without intention and no vision for how this technology will impact our world.

The rhetorical landscape is taking shape for AI in society. People will tire of SciFi visions of AI utopia/dystopia narratives and start seeing what this technology does to their daily lives, how it shapes their interests, and how it changes their habits. In our ever-online and connected lives, we don’t pause to consider the weight of our choices before downloading the latest app because of FOMO or curiosity. We make decisions now so rapidly that we often forget we actually have agency in this whole process—we can pause, we can consider, we can ask questions. Perhaps that’s the best path forward. To be slow. A bit cantankerous. To be human.

AI Disclosure Label: I used Claude’s Sonnet 3.7 to create an artifact about AI safety. I used the content of this post to guide the output into an interactive game.

Joseph Thibault

Feb 28

I so appreciate the time and effort you put in to put these through the paces (and for sharing what other red-teaming has been performed). On the one hand, not the last article I wanted to read on a Friday, on the other, I hope more people get this in their inbox! (now I gotta go watch Ted Lasso for the 5th time...)

Expand full comment

Essenceofwhathappens@gmail.com

Mar 2

Thanks for this information. And for your important conclusion: pause and think! I, too, am especially curious about what it means to have “civic integrity” high or low. Does this assume that we should turn (our) integrity on and off? Eek.