Hacker News new | ask | show | jobs
by FanaHOVA 1038 days ago
Yep, +1. That's why I used the quotes. :) Thanks for expanding!
1 comments

Yep I understood that you were using it informally, just trying to keep things informative for other folks reading too.
there frankly needs to be a paper calling this out tho, because at this point there are a bunch of industry models following “llama laws” and nobody’s really done the research, its all monkey see monkey do
But what would they be calling out?

If industry groups want to run a training run based on the configurations of a well-performing model, I don't see anything wrong with that. Now, if they were to claim that what they are doing is somehow "optimal", then there would be something to criticize.

poor choice of words, i probably mean sketching out the curves/doing ablation studies in a comprehensive way like the chinchilla paper did.
Makes sense! But expensive...