|
|
|
|
|
by sigmoid10
832 days ago
|
|
It's not just the volume of original data that matters here. From empirics we know performance scales roughly like (model parameters)*(training data)*(epochs). If you increase any one of those, you can be certain to improve your model. In the short term, training data volume and quality has given a lot of improvements (especially recently), but in the long run it was always model size and total time spent training that saw improvements. In other words: It doesn't matter how you allocate your extra compute budget as long as you spend it. |
|
Is there any reason to think the same thing wouldn't happen in billion parameter LLMs?