Y
Hacker News
new
|
ask
|
show
|
jobs
by
MichaelZuo
1144 days ago
Your second link has the immediate comment "Gpt3 includes dense attention layers that are n^2". So it's not at all unlikely.
1 comments
space_fountain
1144 days ago
GPT3 was released 3 years ago now. There have been major advancements in scaling attention so it would be strange if they didn't use some of them
link
MichaelZuo
1142 days ago
It doesn't matter how many major advancements they made in scaling, as long as one component is O(n^2) or higher.
link
cubefox
1142 days ago
It's not the scale itself, it's the scaling architecture.
link
MichaelZuo
1142 days ago
The same applies.
link