Hacker News new | ask | show | jobs
by cubefox 1144 days ago
O(n^2) seems unlikely:

https://cognitiverevolution.substack.com/p/openais-foundry-l....

https://news.ycombinator.com/item?id=34977194#:~:text=Sparse...

1 comments

Your second link has the immediate comment "Gpt3 includes dense attention layers that are n^2". So it's not at all unlikely.
GPT3 was released 3 years ago now. There have been major advancements in scaling attention so it would be strange if they didn't use some of them
It doesn't matter how many major advancements they made in scaling, as long as one component is O(n^2) or higher.
It's not the scale itself, it's the scaling architecture.
The same applies.