|
|
|
|
|
by ydau
2150 days ago
|
|
GPT-3 was another buckets worth of evidence in favor of the scaling hypothesis. Performance kept improving (and cost to train kept increasing) as more parameters were added. Even with 175 billion parameters, the performance had not yet plateaued. One take-away is that throwing a lot of compute at the problem helps tremendously :). You can read more about GPT-3 here: https://lambdalabs.com/blog/gpt-3/ |
|
[1] http://incompleteideas.net/IncIdeas/BitterLesson.html
[2] http://incompleteideas.net/