| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by duchenne 913 days ago

The most important paper to understand this issue is "Sacling Laws of Neural Language Models" by Open AI in 2020 [1]. Many consider it the most important paper that predicted the high performance of modern LLMs.

This paper shows how the loss decreases when you increase the model size, compute, or training dataset size.

From the article:

> Convergence is inefficient: When working within a fixed compute budget C but without any other restrictions on the model size N or available data D, we attain optimal performance by training very large models and stopping significantly short of convergence.

It clearly states that when you are limited by your training time compute, you should under-train your model.

[1] https://arxiv.org/abs/2001.08361

1 comments

swyx 913 days ago

that paper is now considered to be a psyop fwiw - but in the direction of too little data, not too many layers

link

highfrequency 913 days ago

Can you clarify what you mean?

link

versteegen 911 days ago

Because the training data/model size/compute tradeoff derived from that paper is highly suboptimal (too many parameters) compared to the ones from the later Deepmind scaling laws [1]. And then Meta researchers recommended using even smaller models, to trade-off training- and inference-time compute [2] (which I thought was pretty obvious if you care about more than just benchmarks).

[1] https://arxiv.org/abs/2203.15556 Training Compute-Optimal Large Language Models

[2] https://arxiv.org/abs/2302.13971 LLaMA: Open and Efficient Foundation Language Models

link

kristianp 912 days ago

He seems to be implying that openai released that paper to throw others off the scent of the direction they were taking.

link