Hacker News new | ask | show | jobs
by senko 584 days ago
No.

The scaling laws may be dead. Does this mean the end of LLM advances? Absolutely not.

There are many different ways to improve LLM capabilities. Everyone was mostly focused on the scaling laws because that worked extremely well (actually surprising most of the researchers).

But if you're keeping an eye on the scientific papers coming out about AI, you've seen the astounding amount of research going on with some very good results, that'll probably take at least several months to trickle down to production systems. Thousands of extremely bright people in AI labs all across the world are working on finding the next trick that boosts AI.

One random example is test-time compute: just give the AI more time to think. This is basically what O1 does. A recent research paper suggests using it is roughly equivalent to an order of magnitude more parameters, performance wise. (source for the curious: https://lnkd.in/duDST65P)

Another example that sounds bonkers but apparently works is quantization: reducing the precision of each parameter to 1.58 bits (ie only using values -1, 0, 1). This uses 10x less space for the same parameter count (compared to standard 16-bit format), and since AI operatons are actually memory limited, directly corresponds to 10x decrease in costs: https://lnkd.in/ddvuzaYp

(Quite apart from improvements like these, we shouldn't forget that not all AIs are LLMs. There's been tremendous advance in AI systems for image, audio and video generation, interpretation and munipulation and they also don't show signs of stopping, and there's possibility that a new or hybrid architecture for the textual AI might be developed).

AI winter is a long way off.

2 comments

Scaling laws are not dead. The number of people predicting death of Moore's law doubles every two years.

- Jim Keller

https://www.youtube.com/live/oIG9ztQw2Gc?si=oaK2zjSBxq2N-zj1...

There are way too many personal definitions of what "Moore's Law" even is to have a discussion without deciding on a shared definition before hand.

But Goodhart's law; "When a measure becomes a target, it ceases to be a good measure"

Directly applies here, Moore's Law was used to set long term plans at semiconductor companies, and Moore didn't have empirical evidence it was even going to continue.

If you say, arbitrarily decide CPU, or worse, single core performance as your measurement, it hasn't held for well over a decade.

If you hold minimum feature size without regard to cost, it is still holding.

What you want to prove usually dictates what interpretation you make.

That said, the scaling law is still unknown, but you can game it as much as you want in similar ways.

GPT4 was already hinting at an asymptote on MMLU, but the question is if it is valid for real work etc...

Time will tell, but I am seeing far less optimism from my sources, but that is just anecdotal.

Moore's law is doomed. At some point you start reaching the level of individual atoms. This is just physics.
You are missing the economic component.. it isn't just how small can a transistor be.. it was really about how many transistors can you get for your money. So even when we reach terminal density, we probably haven't reached terminal economics.
I didn't say we have currently reached a limit. I am saying that there obvious is a limit (at some point). So, scaling cannot go forever. This is a counterpoint to the dubious analogy with deep learning.
The limits are engineering, not physics. Atoms need not be a barrier for a long time if you can go fully 3D, for example, but manufacturing challenges, power and heat get in the way long before that.

Then you can go ultra-wide in terms of cores, dispatchers and vectors (essentially building bigger and bigger chips), but an algorithm which can't exploit that will be little faster on today's chips than on a 4790K from ten years ago.

> Everyone was mostly focused on the scaling laws because that worked extremely well

Also because it was easy, and expense was not the first concern.