Hacker News new | ask | show | jobs
by v64 1018 days ago
The Llama 1 paper [1] was one of the earlier models to question the assumption that more params = better model. Since then they've released Llama 2 and this post is offering more evidence that reinforces their hypothesis.

I wouldn't say it was an oversight by other labs that they missed this. It's easier to just increase params on a model over the same training set instead of gathering a larger training set necessary for a smaller model. And at first, increasing model size did seem to be the way forward, but we've since hit diminishing returns. Now that we've hit that point, we've begun exploring other options and the Llamas are early evidence of another way forward.

[1] https://arxiv.org/abs/2302.13971