Why Claude's Comment Paper Is a Poor Rebuttal

https://news.ycombinator.com/rss Hits: 2

Summary

Recently Apple published a paper on LRMs (Large Reasoning Models) and how they found that “that LRMs have limitations in exact computation” and that “they fail to use explicit algorithms and reason inconsistently across puzzles.” I would consider this a death blow paper to the current push for using LLMs and LRMs as the basis for AGI. Subbaro Kambhampati and Yann LeCun seem to agree. You could say that the paper knocked out LLMs. More recently, a comment paper showed up on Arxiv and shared around X as a rebuttal to Apple’s paper. Putting aside the stunt of having Claude Opus as a co-author (yes, I’m not kidding), the paper in itself is a poor rebuttal for many reasons which we shall explore, but mainly for missing the entire point of the paper and prior research by AI researchers such as Professor Kambhampati.Mathematical ErrorsFirstly the paper makes some key mathematical errors. As Andreas Kirsch points out on X, it makes the claim that token growth is predicted by the following:$$ T(N) \approx 5(2^N - 1)^2 + C $$namely, it predicts a quadratic token growth for solutions of Towers of Hanoi. The reality is that the growth of tokens is linear regardless of the number of steps required to solve. In fact, Gemini 2.5 Pro outputs a solution in under 10k tokens for $n=10$ discs.See Andreas’s post on X:Confused Between Mechanical Execution and Reasoning ComplexityThe rebuttal conflates solution length with computational difficulty. As Kirsch points out, different puzzles have vastly different complexity profiles:Tower of Hanoi: Requires $2^N-1$ moves, but has branching factor 1 and requires no search - just mechanical execution with “trivial $O(1)$ decision process per move”River Crossing: Requires ~4N moves, but has branching factor >4 and is NP-hard, requiring complex constraint satisfaction and searchThis explains why models might execute 100+ Hanoi moves while failing on 5-move River Crossing problems - the latter requires genuine reasoning while the former is m...

First seen: 2025-06-16 10:09

Last seen: 2025-06-16 11:09

Read Full Article More from this Source

Why Claude's Comment Paper Is a Poor Rebuttal

Summary

Related News

A Framework for Characterizing Emergent Conflict Between Non-Coordinating Agents [pdf]

Is Gravity Just Entropy Rising? Long-Shot Idea Gets Another Look

Chemical knowledge and reasoning of large language models vs. chemist expertise

Jokes and Humour in the Public Android API

Childhood leukemia: how a deadly cancer became treatable