| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by arugulum 1038 days ago

I want to jump in and correct your usage of "LLaMA Laws" (even you are using it informally, but I just want to clarify).

There is no "LLaMA scaling law". There are a set of LLaMA training configurations.

Scaling laws describe the relationship between training compute, data, and expected loss (performance). Kaplan et al., estimated one set of laws, and the Chinchilla folks refined that estimate (mainly improving it by adjusting the learning rate schedule).

The LLaMA papers do not posit any new law nor contradict any prior one. They chose a specific training configuration that still abide by the scaling laws but with a different goal in mind.

(Put another way: a scaling law doesn't tell you what configuration to train on. It tells you what to expect given a configuration, but you're free to decide on whatever configuration you want.)

2 comments

npsomaratna 1038 days ago

Isn't the Chinchilla estimate considered to be wrong now?

https://espadrine.github.io/blog/posts/chinchilla-s-death.ht...

link

FanaHOVA 1038 days ago

Yep, +1. That's why I used the quotes. :) Thanks for expanding!

link

arugulum 1038 days ago

Yep I understood that you were using it informally, just trying to keep things informative for other folks reading too.

link

swyx 1038 days ago

there frankly needs to be a paper calling this out tho, because at this point there are a bunch of industry models following “llama laws” and nobody’s really done the research, its all monkey see monkey do

link

arugulum 1038 days ago

But what would they be calling out?

If industry groups want to run a training run based on the configurations of a well-performing model, I don't see anything wrong with that. Now, if they were to claim that what they are doing is somehow "optimal", then there would be something to criticize.

link

swyx 1038 days ago

poor choice of words, i probably mean sketching out the curves/doing ablation studies in a comprehensive way like the chinchilla paper did.

link

arugulum 1038 days ago

Makes sense! But expensive...

link