19 Comments

Nailed it. Well done, congratulations. See my latest for another data point.

Expand full comment

Good text on an important topic.

Grounding it all in computer basics helps dissipate the illusion and demystify the process.

Prompt = input, command line

Inference = a program that converts the input string to retrieval coordinates in the model and iteratively samples the model to produce an output

Model = a resource file

Training = sampling across a huge body of text to perform lossy multi-level compression

LLM = misnomer, should be ”large text model” as ”language” implies non-existent semantics

Thinking = misnomer as it implies non-existent semantic abstraction

Reason = misnomer as it implies mechanics such as deduction and induction, and presence of non-existent ontologies, on top of the non-existent language and thinking

Hallucination = misnomer as it implies an inverted frame where syntax correlates with external referents rather than just statistical relations between different length runs of tokens

Token = misnomer as it implies a link from symbol to meaning. A token represents something, but in this context it represents nothing outside itself

And so on

Expand full comment

nice analogy between Turk and the reasoning models.

Overall, I mostly agree with your argument hear as the more we associate these models are reasoning agents, the more prone we are to fall for the automation bias and cognitively offload critical thinking skills to AI.

Expand full comment

A fundamental reason why the idea of reasoning, especially introduced by OpenAI with the o1 model, is so prevalent and so much emphasized throughout their PR and marketing, is because if the models were actually able to reason, then it would make a better case for the models not doing infringement on copyright.

Whereas on the other hand, if the models are mostly pattern matching and a sophisticated version of information retrieval, then all the data that was used to train the models... is basically still in there, just vectorized, which is probably technically true, and hence there's a copyright issue.

Hence OpenAI is so eager to frame the process as reasoning.

Expand full comment

Honestly, I have a hard time finding a use case for reasoning AI. I get way better results when I control the "reasoning" of AI with prompts, knowledge bases, and chain of thought.

Expand full comment

Reasoning AI are just AI trained to automatically use chain of thought.

Expand full comment

Yeah … I’d rather do the chain of thought myself in most instances.

Expand full comment

Very well said Marc. If I could draw a conclusion from this, it would be that the real danger of generative AI isn't from super-human intelligence, it's giving our trust over to the mechanical turk's untrustworthy decision-making output.

Expand full comment

Thank you! I've seen articles on LinkedIn about how companies are using AI, and a common thread in many of them is that the writers try very hard to personify LLMs by saying they have human traits like curiosity and reasoning. As if the AI consciously follows a train of thought.

I am concerned that the people making the decisions to rely on AI technologies for their businesses and other organizations don't keep in mind that AI is subject to the biases of its programmers and it's training data.

Yes, we are making great technological advances in the field of AI, and yes, it is a helpful tool that is getting better, but we as humans must steer it correctly and not offload our cognitive responsibilities into it.

Expand full comment

I put one of the sample questions in Kevin Roose's recent column about "Humanity's Last Exam" (the physics question) (https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html) into DeepSeek and it worked for a staggering 600+ seconds, walking through about a dozen different permutations of how to approach the problem (I had no idea if any of them made sense). It all looked very impressive. The net result - the answer it came up with was 8. The next day, to show a friend, I put in the exact same prompt. Again, it worked for more than 10 minutes - the "reasoning" process was longer than any output I've gotten from an LLM ever - and the final answer? 10! So, obviously, the test has not been passed. One issue of course is given the specialization of the problem, the average user would have no idea how to distinguish how it arrived at its response. However, I would imagine an expert would - it might be interesting to see where it went wrong and for a human to parse the reasoning (as Marc does above) step by step to see what happened. I've read that this aspect of AI can be useful in and of itself because it might open up attempted solutions no human would ever consider (here, I think about the Alpha Go example which created an entirely novel strategy to win that game against a Go expert) and therefore offer ideas to experts about how to go about solving previously unsolvable problems. But that strikes me as a very different purpose than the average user is expecting. As for the anthropomorphizing, you have AI bloggers like Ethan Molick essentially telling people to treat AI as a person for the purposes of getting results. So there are clearly mixed messages. From my own experience, given some of the eerie conversations I've had, it can be challenging NOT to fall into the trap of feeling like you're talking to an alien intelligence no matter how much you know otherwise. It defies reason at times to think it's just a "prediction" machine even though we know that's all it is. Where does that leave us? As a teacher, I see the issue of student use of AI even more complicated than it was 2 years ago not only because the students are getting more sophisticated, but it's becoming fairly common as the go to - it's the new "Google" as one of my students said to me. When teachers do not address it - and most don't - it's just going to metastasize and the kids will be making up the rules.

Expand full comment

Mollick has done a fair amount of damage in this arena of anthropomorphism and over-reliance on LLMs. It has been depressing to watch his popularity grow because I think he's driving a lot of people in the wrong direction.

Expand full comment

Words matter. Names matter. Marketers know this.

Expand full comment

Thanks Marc. I like the focus here on deconstructing the chain of thought trace features of these models, which is imo the most important part of their implementation. I don't find that my use cases result in better results in these models compared to their non-reasoning counterparts as so many are hyping them. I do, however, appreciate the minimal amount of transparency they afford so that I can at least get some minor breadcrumbs to investigate should I need to interrogate their responses more effectively.

Expand full comment

Thanks for a thought provoking article. The way we anthropomorphize AI (and tech in general) has some serious implications. I still struggle with my own mental models about AI and I suspect I'm not alone.

Expand full comment

Great piece, Marc. It affirms my sense that we need analogies and metaphors different from those of the human mind. The initial enthusiastic response to the new models always imagines the gap between what the machine can do (quite impressive in this case!) and what a human would do as slight. It feels close if you imagine the machine is thinking. How hard can it be to have the model's outputs agree with reality or truth?

The answer is that it is quite hard. The model does not think in a way that fits the analogy of the human mind's processes. It does not have an understanding of reality or truth outside the vectors it uses to manipulate language. Layering on post-training to get it to answer questions more reliably is not going to fix its fundamental nature, which is grounded in probabilistic mathematics applied to language, not a human understanding of reality.

Expand full comment

My comment may be caffeine fueled nonsense, but I wonder if we should start from the assumption that human reasoning is the bar to be met. In other words, should the goal be human-like reasoning or should it be something different. (Sorry if this is off-point, but I've been hammering my doctoral students about assumption lately, so I have them on my mind!)

Expand full comment

I don't think this is nonsense at all. I think analogizing artificial intelligence to human reasoning has been the dominant framework since the 1950s. It is right there in the name, and it is the underlying assumption for the Turing Test. That frame has been really useful in developing technologies from Eliza to ChatGPT.

Like any analogy, it has limits and blind spots. I'm more interested in exploring alternatives, which will have their own limits, than continuing to imagine we are building mechanical turks.

Expand full comment

The confusion between machine and human logic is one of the main reasons why Weizenbaum became such a vocal critic of computers after his ELIZA test. He realized how gullible people were and was shocked even his secretary thought there was some intelligence to the machine.

Expand full comment

The introduction to Computer Power and Human Reason is such a perfect introduction to natural language and computing. I'm hoping it survives the cut as I finalize the reading for my fall class on AI.

Expand full comment