Hacker News new | ask | show | jobs
by elcritch 455 days ago
What are other barriers in transformers? Or is the normalization layer the primary one?
1 comments

dot-product attention is the biggest barrier. This is why there are so many attempts to linearize it.
that fail... linearization is a bad idea. But plenty of other optimizations are done