We shouldn’t need any illusions to understand how generative tools might be useful. This obsession with anthropomorphization hinders our ability to understand what these systems can and cannot do, leaving us with a confused and muddled idea of their capabilities. An LLM’s ability to predict patterns is impressive and quite useful in many contexts, but that doesn't make it conscious.
Grounding it all in computer basics helps dissipate the illusion and demystify the process.
Prompt = input, command line
Inference = a program that converts the input string to retrieval coordinates in the model and iteratively samples the model to produce an output
Model = a resource file
Training = sampling across a huge body of text to perform lossy multi-level compression
LLM = misnomer, should be ”large text model” as ”language” implies non-existent semantics
Thinking = misnomer as it implies non-existent semantic abstraction
Reason = misnomer as it implies mechanics such as deduction and induction, and presence of non-existent ontologies, on top of the non-existent language and thinking
Hallucination = misnomer as it implies an inverted frame where syntax correlates with external referents rather than just statistical relations between different length runs of tokens
Token = misnomer as it implies a link from symbol to meaning. A token represents something, but in this context it represents nothing outside itself
nice analogy between Turk and the reasoning models.
Overall, I mostly agree with your argument hear as the more we associate these models are reasoning agents, the more prone we are to fall for the automation bias and cognitively offload critical thinking skills to AI.
A fundamental reason why the idea of reasoning, especially introduced by OpenAI with the o1 model, is so prevalent and so much emphasized throughout their PR and marketing, is because if the models were actually able to reason, then it would make a better case for the models not doing infringement on copyright.
Whereas on the other hand, if the models are mostly pattern matching and a sophisticated version of information retrieval, then all the data that was used to train the models... is basically still in there, just vectorized, which is probably technically true, and hence there's a copyright issue.
Hence OpenAI is so eager to frame the process as reasoning.
Honestly, I have a hard time finding a use case for reasoning AI. I get way better results when I control the "reasoning" of AI with prompts, knowledge bases, and chain of thought.
Very well said Marc. If I could draw a conclusion from this, it would be that the real danger of generative AI isn't from super-human intelligence, it's giving our trust over to the mechanical turk's untrustworthy decision-making output.
Thank you! I've seen articles on LinkedIn about how companies are using AI, and a common thread in many of them is that the writers try very hard to personify LLMs by saying they have human traits like curiosity and reasoning. As if the AI consciously follows a train of thought.
I am concerned that the people making the decisions to rely on AI technologies for their businesses and other organizations don't keep in mind that AI is subject to the biases of its programmers and it's training data.
Yes, we are making great technological advances in the field of AI, and yes, it is a helpful tool that is getting better, but we as humans must steer it correctly and not offload our cognitive responsibilities into it.
I put one of the sample questions in Kevin Roose's recent column about "Humanity's Last Exam" (the physics question) (https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html) into DeepSeek and it worked for a staggering 600+ seconds, walking through about a dozen different permutations of how to approach the problem (I had no idea if any of them made sense). It all looked very impressive. The net result - the answer it came up with was 8. The next day, to show a friend, I put in the exact same prompt. Again, it worked for more than 10 minutes - the "reasoning" process was longer than any output I've gotten from an LLM ever - and the final answer? 10! So, obviously, the test has not been passed. One issue of course is given the specialization of the problem, the average user would have no idea how to distinguish how it arrived at its response. However, I would imagine an expert would - it might be interesting to see where it went wrong and for a human to parse the reasoning (as Marc does above) step by step to see what happened. I've read that this aspect of AI can be useful in and of itself because it might open up attempted solutions no human would ever consider (here, I think about the Alpha Go example which created an entirely novel strategy to win that game against a Go expert) and therefore offer ideas to experts about how to go about solving previously unsolvable problems. But that strikes me as a very different purpose than the average user is expecting. As for the anthropomorphizing, you have AI bloggers like Ethan Molick essentially telling people to treat AI as a person for the purposes of getting results. So there are clearly mixed messages. From my own experience, given some of the eerie conversations I've had, it can be challenging NOT to fall into the trap of feeling like you're talking to an alien intelligence no matter how much you know otherwise. It defies reason at times to think it's just a "prediction" machine even though we know that's all it is. Where does that leave us? As a teacher, I see the issue of student use of AI even more complicated than it was 2 years ago not only because the students are getting more sophisticated, but it's becoming fairly common as the go to - it's the new "Google" as one of my students said to me. When teachers do not address it - and most don't - it's just going to metastasize and the kids will be making up the rules.
Mollick has done a fair amount of damage in this arena of anthropomorphism and over-reliance on LLMs. It has been depressing to watch his popularity grow because I think he's driving a lot of people in the wrong direction.
Thanks Marc. I like the focus here on deconstructing the chain of thought trace features of these models, which is imo the most important part of their implementation. I don't find that my use cases result in better results in these models compared to their non-reasoning counterparts as so many are hyping them. I do, however, appreciate the minimal amount of transparency they afford so that I can at least get some minor breadcrumbs to investigate should I need to interrogate their responses more effectively.
Thanks for a thought provoking article. The way we anthropomorphize AI (and tech in general) has some serious implications. I still struggle with my own mental models about AI and I suspect I'm not alone.
Great piece, Marc. It affirms my sense that we need analogies and metaphors different from those of the human mind. The initial enthusiastic response to the new models always imagines the gap between what the machine can do (quite impressive in this case!) and what a human would do as slight. It feels close if you imagine the machine is thinking. How hard can it be to have the model's outputs agree with reality or truth?
The answer is that it is quite hard. The model does not think in a way that fits the analogy of the human mind's processes. It does not have an understanding of reality or truth outside the vectors it uses to manipulate language. Layering on post-training to get it to answer questions more reliably is not going to fix its fundamental nature, which is grounded in probabilistic mathematics applied to language, not a human understanding of reality.
My comment may be caffeine fueled nonsense, but I wonder if we should start from the assumption that human reasoning is the bar to be met. In other words, should the goal be human-like reasoning or should it be something different. (Sorry if this is off-point, but I've been hammering my doctoral students about assumption lately, so I have them on my mind!)
I don't think this is nonsense at all. I think analogizing artificial intelligence to human reasoning has been the dominant framework since the 1950s. It is right there in the name, and it is the underlying assumption for the Turing Test. That frame has been really useful in developing technologies from Eliza to ChatGPT.
Like any analogy, it has limits and blind spots. I'm more interested in exploring alternatives, which will have their own limits, than continuing to imagine we are building mechanical turks.
The confusion between machine and human logic is one of the main reasons why Weizenbaum became such a vocal critic of computers after his ELIZA test. He realized how gullible people were and was shocked even his secretary thought there was some intelligence to the machine.
The introduction to Computer Power and Human Reason is such a perfect introduction to natural language and computing. I'm hoping it survives the cut as I finalize the reading for my fall class on AI.
Nailed it. Well done, congratulations. See my latest for another data point.
Good text on an important topic.
Grounding it all in computer basics helps dissipate the illusion and demystify the process.
Prompt = input, command line
Inference = a program that converts the input string to retrieval coordinates in the model and iteratively samples the model to produce an output
Model = a resource file
Training = sampling across a huge body of text to perform lossy multi-level compression
LLM = misnomer, should be ”large text model” as ”language” implies non-existent semantics
Thinking = misnomer as it implies non-existent semantic abstraction
Reason = misnomer as it implies mechanics such as deduction and induction, and presence of non-existent ontologies, on top of the non-existent language and thinking
Hallucination = misnomer as it implies an inverted frame where syntax correlates with external referents rather than just statistical relations between different length runs of tokens
Token = misnomer as it implies a link from symbol to meaning. A token represents something, but in this context it represents nothing outside itself
And so on
nice analogy between Turk and the reasoning models.
Overall, I mostly agree with your argument hear as the more we associate these models are reasoning agents, the more prone we are to fall for the automation bias and cognitively offload critical thinking skills to AI.
A fundamental reason why the idea of reasoning, especially introduced by OpenAI with the o1 model, is so prevalent and so much emphasized throughout their PR and marketing, is because if the models were actually able to reason, then it would make a better case for the models not doing infringement on copyright.
Whereas on the other hand, if the models are mostly pattern matching and a sophisticated version of information retrieval, then all the data that was used to train the models... is basically still in there, just vectorized, which is probably technically true, and hence there's a copyright issue.
Hence OpenAI is so eager to frame the process as reasoning.
Honestly, I have a hard time finding a use case for reasoning AI. I get way better results when I control the "reasoning" of AI with prompts, knowledge bases, and chain of thought.
Reasoning AI are just AI trained to automatically use chain of thought.
Yeah … I’d rather do the chain of thought myself in most instances.
Very well said Marc. If I could draw a conclusion from this, it would be that the real danger of generative AI isn't from super-human intelligence, it's giving our trust over to the mechanical turk's untrustworthy decision-making output.
Thank you! I've seen articles on LinkedIn about how companies are using AI, and a common thread in many of them is that the writers try very hard to personify LLMs by saying they have human traits like curiosity and reasoning. As if the AI consciously follows a train of thought.
I am concerned that the people making the decisions to rely on AI technologies for their businesses and other organizations don't keep in mind that AI is subject to the biases of its programmers and it's training data.
Yes, we are making great technological advances in the field of AI, and yes, it is a helpful tool that is getting better, but we as humans must steer it correctly and not offload our cognitive responsibilities into it.
I put one of the sample questions in Kevin Roose's recent column about "Humanity's Last Exam" (the physics question) (https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html) into DeepSeek and it worked for a staggering 600+ seconds, walking through about a dozen different permutations of how to approach the problem (I had no idea if any of them made sense). It all looked very impressive. The net result - the answer it came up with was 8. The next day, to show a friend, I put in the exact same prompt. Again, it worked for more than 10 minutes - the "reasoning" process was longer than any output I've gotten from an LLM ever - and the final answer? 10! So, obviously, the test has not been passed. One issue of course is given the specialization of the problem, the average user would have no idea how to distinguish how it arrived at its response. However, I would imagine an expert would - it might be interesting to see where it went wrong and for a human to parse the reasoning (as Marc does above) step by step to see what happened. I've read that this aspect of AI can be useful in and of itself because it might open up attempted solutions no human would ever consider (here, I think about the Alpha Go example which created an entirely novel strategy to win that game against a Go expert) and therefore offer ideas to experts about how to go about solving previously unsolvable problems. But that strikes me as a very different purpose than the average user is expecting. As for the anthropomorphizing, you have AI bloggers like Ethan Molick essentially telling people to treat AI as a person for the purposes of getting results. So there are clearly mixed messages. From my own experience, given some of the eerie conversations I've had, it can be challenging NOT to fall into the trap of feeling like you're talking to an alien intelligence no matter how much you know otherwise. It defies reason at times to think it's just a "prediction" machine even though we know that's all it is. Where does that leave us? As a teacher, I see the issue of student use of AI even more complicated than it was 2 years ago not only because the students are getting more sophisticated, but it's becoming fairly common as the go to - it's the new "Google" as one of my students said to me. When teachers do not address it - and most don't - it's just going to metastasize and the kids will be making up the rules.
Mollick has done a fair amount of damage in this arena of anthropomorphism and over-reliance on LLMs. It has been depressing to watch his popularity grow because I think he's driving a lot of people in the wrong direction.
Words matter. Names matter. Marketers know this.
Thanks Marc. I like the focus here on deconstructing the chain of thought trace features of these models, which is imo the most important part of their implementation. I don't find that my use cases result in better results in these models compared to their non-reasoning counterparts as so many are hyping them. I do, however, appreciate the minimal amount of transparency they afford so that I can at least get some minor breadcrumbs to investigate should I need to interrogate their responses more effectively.
Thanks for a thought provoking article. The way we anthropomorphize AI (and tech in general) has some serious implications. I still struggle with my own mental models about AI and I suspect I'm not alone.
Great piece, Marc. It affirms my sense that we need analogies and metaphors different from those of the human mind. The initial enthusiastic response to the new models always imagines the gap between what the machine can do (quite impressive in this case!) and what a human would do as slight. It feels close if you imagine the machine is thinking. How hard can it be to have the model's outputs agree with reality or truth?
The answer is that it is quite hard. The model does not think in a way that fits the analogy of the human mind's processes. It does not have an understanding of reality or truth outside the vectors it uses to manipulate language. Layering on post-training to get it to answer questions more reliably is not going to fix its fundamental nature, which is grounded in probabilistic mathematics applied to language, not a human understanding of reality.
My comment may be caffeine fueled nonsense, but I wonder if we should start from the assumption that human reasoning is the bar to be met. In other words, should the goal be human-like reasoning or should it be something different. (Sorry if this is off-point, but I've been hammering my doctoral students about assumption lately, so I have them on my mind!)
I don't think this is nonsense at all. I think analogizing artificial intelligence to human reasoning has been the dominant framework since the 1950s. It is right there in the name, and it is the underlying assumption for the Turing Test. That frame has been really useful in developing technologies from Eliza to ChatGPT.
Like any analogy, it has limits and blind spots. I'm more interested in exploring alternatives, which will have their own limits, than continuing to imagine we are building mechanical turks.
The confusion between machine and human logic is one of the main reasons why Weizenbaum became such a vocal critic of computers after his ELIZA test. He realized how gullible people were and was shocked even his secretary thought there was some intelligence to the machine.
The introduction to Computer Power and Human Reason is such a perfect introduction to natural language and computing. I'm hoping it survives the cut as I finalize the reading for my fall class on AI.