| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by immichaelwang 1166 days ago
	Isn't there more and more research coming out that at a certain point (200B~), parameters have significantly decreasing returns and it's better to just then do some supervised learning ontop of the base model?

2 comments

Parameters don't have diminishing returns so much as we don't have enough (distinct) data to train models to use that many parameters efficiently.

can you point to any of that? my understanding is that we haven't reached any dimishing returns with transformer models and scaling yet