Hacker News new | ask | show | jobs
by space_fountain 1144 days ago
GPT3 was released 3 years ago now. There have been major advancements in scaling attention so it would be strange if they didn't use some of them
1 comments

It doesn't matter how many major advancements they made in scaling, as long as one component is O(n^2) or higher.
It's not the scale itself, it's the scaling architecture.
The same applies.