Hacker News new | ask | show | jobs
by MichaelZuo 1144 days ago
Your second link has the immediate comment "Gpt3 includes dense attention layers that are n^2". So it's not at all unlikely.
1 comments

GPT3 was released 3 years ago now. There have been major advancements in scaling attention so it would be strange if they didn't use some of them
It doesn't matter how many major advancements they made in scaling, as long as one component is O(n^2) or higher.
It's not the scale itself, it's the scaling architecture.
The same applies.