Why R the Critical Value and Emergent Behavior of Large Language Models Fake?

https://news.ycombinator.com/rss Hits: 1
Summary

Why there are no emergent properties in Large Language Models. We heard lot about emergent properties of Large Language Models (LLMs) last year. I will share with you my thoughts, and some other scientists, of why there are no emergent properties and especially why the assumed critical value that these so-called emergent properties are based upon is not substantial. The excitement about emergent properties started with a paper by [1], where the authors show that scaling LLMs beyond a specific size (they claim is critical) then the system provided unexpected behavior. Unexpected in that it was not considered that it can be done like ‘doing’ arithmetics for instance. In support of their claim, the graphs that the authors provided, displayed a sharp jump in the performance of the LLM in terms of accuracy. The problem in their demonstration is the following: They are using logarithmic charts where the x axis represents the weights (i.e., hyperparameters of the neural network of the LLM in use) and is divided as 10^1, 10^2, 10^3…10^10, 10^11 in equally separated units. The sharp jump occurs between 10^10 and 10^11 on the chart. But, this single unit shift between 10^10 and 10^11 is in fact multiplying 10 billion by 10 which means an increase (i.e., shift) of 90 billion! This kind of representation should have been done in linear scale to avoid any misunderstanding of the rate of change of the behavior of the system. If we draw the same graph in [1] in linear scale, the rate of change will appear almost constant [2]. Thus, the system will appear evolving normally, as expected, and 10^10 will not appear as critical and alarming boundary. Besides, expanding the system by 90 billion weights means supporting it with much more data than when increasing from 1,000 parameters to 10,000 (i.e., an increase of 9K) or from 10,000 to 100,000 (i.e., an increase of 90K) which will not require as much data to its training repository compared to when adding 90 billion parameters. For exa...

First seen: 2025-04-21 12:35

Last seen: 2025-04-21 12:35