Failing to Understand the Exponential, Again

https://news.ycombinator.com/rss Hits: 6
Summary

The current discourse around AI progress and a supposed “bubble” reminds me a lot of the early weeks of the Covid-19 pandemic. Long after the timing and scale of the coming global pandemic was obvious from extrapolating the exponential trends, politicians, journalists and most public commentators kept treating it as a remote possibility or a localized phenomenon. Something similarly bizarre is happening with AI capabilities and further progress. People notice that while AI can now write programs, design websites, etc, it still often makes mistakes or goes in a wrong direction, and then they somehow jump to the conclusion that AI will never be able to do these tasks at human levels, or will only have a minor impact. When just a few years ago, having AI do these things was complete science fiction! Or they see two consecutive model releases and don’t notice much difference in their conversations, and they conclude that AI is plateauing and scaling is over. METR Accurately evaluating AI progress is hard, and commonly requires a combination of both AI expertise and subject matter understanding. Fortunately, there are entire organizations like METR whose sole purpose is to study AI capabilities! We can turn to their recent study "Measuring AI Ability to Complete Long Tasks", which measures the length of software engineering tasks models can autonomously perform: We can observe a clear exponential trend, with Sonnet 3.7 achieving the best performance by completing tasks up to an hour in length at 50% success rate. However, at this point Sonnet 3.7 is 7 months old, coincidentally the same as the doubling rate claimed by METR in their study. Can we use this to verify if METR's findings hold up? Yes! In fact, METR themselves keep an up-to-date plot on their study website: We can see the addition of recent models such as Grok 4, Opus 4.1, and GPT-5 at the top right of the graph. Not only did the prediction hold up, these recent models are actually slightly above trend, now pe...

First seen: 2025-09-28 14:27

Last seen: 2025-09-29 09:31