| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pellucide 791 days ago
	Somewhere I read that the 8B llama2 model could be undertrained by 100-1000x. So is it possible to train a model with 8B/100 = 80M parameters to perform as good as the llama2 8B model, given enough training time and training tokens?

1 comments

modeless 791 days ago

It's unclear. It might take a larger dataset than actually exists, or more compute than is practical. Or there may be a limit that we just haven't reached yet; this actually seems quite likely. The scaling "laws" are really more like guidelines and they are likely wrong when extrapolated too far.

link

pellucide 791 days ago

Thanks!

link