Hacker News new | ask | show | jobs
by swyx 1033 days ago
there frankly needs to be a paper calling this out tho, because at this point there are a bunch of industry models following “llama laws” and nobody’s really done the research, its all monkey see monkey do
1 comments

But what would they be calling out?

If industry groups want to run a training run based on the configurations of a well-performing model, I don't see anything wrong with that. Now, if they were to claim that what they are doing is somehow "optimal", then there would be something to criticize.

poor choice of words, i probably mean sketching out the curves/doing ablation studies in a comprehensive way like the chinchilla paper did.
Makes sense! But expensive...