The Illusion of "The Illusion of Thinking"

https://news.ycombinator.com/rss Hits: 2
Summary

Very recently (early June 2025), Apple released a paper called The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. This has been widely understood to demonstrate that reasoning models don’t “actually” reason. I do not believe that AI language models are on the path to superintelligence. But I still don’t like this paper very much. What does it really show? And what does that mean for how we should think about language models? What does the paper demonstrate? The Apple paper starts by arguing that we shouldn’t care about how good reasoning models are at mathematics and coding benchmarks, because (a) those benchmarks are contaminated, and (b) you can’t run good experiments on mathematics and coding tasks, because there’s no easy measure of complexity. Instead, they evaluate reasoning models on four artifical puzzle environments (Tower of Hanoi variants), scaling up from trivial puzzles like Tower of Hanoi with one disk to Tower of Hanoi with twenty disks. Here’s an example where they compare the non-reasoning DeepSeek-V3 with the reasoning DeepSeek-R1: This pattern was basically the same for all pairs of reasoning/non-reasoning models and all puzzles. Here are the big conclusions the paper draws from this: For very simple puzzles, non-reasoning models are equal or better, because reasoning models sometimes “overthink” themselves into a wrong answer For middle-difficulty puzzles, reasoning models are notably better Once the difficulty gets sufficiently high, even the reasoning model fails to answer correctly, no matter how much time you give it. The paper goes on to examine the internal reasoning traces for the reasoning models, which supports the above conclusions: as you might expect, the correct answer shows up almost immediately for trivial problems, then takes more reasoning for harder problems, then never shows up at all for the hardest. The paper notes that as you ramp up complexity, once ...

First seen: 2025-06-09 10:17

Last seen: 2025-06-09 11:17