Hacker News new | ask | show | jobs
by dwaltrip 937 days ago
For the current generative AI wave, this is how I understand it:

1. The scaling path is decreased val/test loss during training.

2. We have seen multiples times that large decreases in this loss have resulted in very impressive improvements in model capability across a diverse set of tasks (e.g. gpt-1 through gpt-4, and many other examples).

3. By now, there is tons of robust data demonstrating really nice relationships between model size, quantity of data, length of training, quality of data, etc and decreased loss. Evidence keeps building that most multi-billion param LLMs are probably undertrained, perhaps significantly so.

4. Ergo, we should expect continued capability improvement with continued scaling. Make a bigger model, get more data, get higher data quality, and/or train for longer and we will see improved capabilities. The graphs demand that it is so.

---

This is the fundamental scaling hypothesis that labs like OpenAI and Anthropic have been operating off of for the past 5+ years. They looked at the early versions of the curves mentioned above, extended the lines, and said, "Huh... These lines are so sharp. Why wouldn't it keep going? It seems like it would."

And they were right. The scaling curves may break at some point. But they don't show indications of that yet.

Lastly, all of this is largely just taking existing model architectures and scaling up. Neural nets are a very young technology. There will be better architectures in the future.

1 comments

We're at the point now where the harder problem is obtaining the high quality data you need for the initial training in sufficient quantities.
These European efforts to create competitive LLMs need to know that.
I don't think they will go anywhere. Europe doesn't have the ruthlessness required to compete in such an arena, it would need far more unification first before that could happen. And we're only drifting further apart it seems.