Hacker News new | ask | show | jobs
by immichaelwang 1118 days ago
Isn't there more and more research coming out that at a certain point (200B~), parameters have significantly decreasing returns and it's better to just then do some supervised learning ontop of the base model?
2 comments

Parameters don't have diminishing returns so much as we don't have enough (distinct) data to train models to use that many parameters efficiently.
can you point to any of that? my understanding is that we haven't reached any dimishing returns with transformer models and scaling yet