Is chain-of-thought AI reasoning a mirage?

https://news.ycombinator.com/rss Hits: 13
Summary

Reading research papers and articles about chain-of-thought reasoning makes me frustrated. There are many interesting questions to ask about chain-of-thought: how accurately it reflects the actual process going on, why training it “from scratch” often produces chains that switch fluidly between multiple languages, and so on. However, people keep asking the least interesting question possible: whether chain-of-thought reasoning is “really” reasoning. Apple took up this question in their Illusion of Thinking paper, which I’ve already written about. Now there’s a paper from Arizona State University that’s getting some attention called Is Chain-of-Thought Reasoning of LLMs a Mirage? As will become clear, I do not think this is a very good paper. What does the Arizona State paper argue? Here’s the core point: CoT reasoning works effectively when applied to in-distribution or near in-distribution data but becomes fragile and prone to failure even under moderate distribution shifts. In some cases, LLMs generate fluent yet logically inconsistent reasoning steps. The results suggest that what appears to be structured reasoning can be a mirage, emerging from memorized or interpolated patterns in the training data rather than logical inference. The strategy of the paper is to train a small transformer model (~600k params) on a corpus of non-language data transformations. What does this mean? As far as I can tell, that when prompted with something like “A B C D [M1]”, the model should respond “B C D E”, if the “M1” operation in training data means “advance each letter forward by one”. The training data contained several kinds of operation, which were composed arbitrarily (e.g. “A B C D [M1] [M1]” should produce “C D E F”). Finally, the training data included chains-of-thought like: A B C D [M1] [M1] <think> B C D E [M1] </think> C D E F Overall, the idea is to teach the model a very simple way of expressing chains-of-thought to solve toy alphabet problems, which has the good ef...

First seen: 2025-08-14 15:16

Last seen: 2025-08-15 12:20