Y
Hacker News
new
|
ask
|
show
|
jobs
by
space_fountain
1144 days ago
GPT3 was released 3 years ago now. There have been major advancements in scaling attention so it would be strange if they didn't use some of them
1 comments
MichaelZuo
1142 days ago
It doesn't matter how many major advancements they made in scaling, as long as one component is O(n^2) or higher.
link
cubefox
1142 days ago
It's not the scale itself, it's the scaling architecture.
link
MichaelZuo
1142 days ago
The same applies.
link