Hacker News new | ask | show | jobs
by ydau 2150 days ago
GPT-3 was another buckets worth of evidence in favor of the scaling hypothesis. Performance kept improving (and cost to train kept increasing) as more parameters were added. Even with 175 billion parameters, the performance had not yet plateaued. One take-away is that throwing a lot of compute at the problem helps tremendously :).

You can read more about GPT-3 here: https://lambdalabs.com/blog/gpt-3/

1 comments

Are you alluding to "The Bitter Lesson" [1] by Rich Sutton [2]?

[1] http://incompleteideas.net/IncIdeas/BitterLesson.html

[2] http://incompleteideas.net/