Hacker News new | ask | show | jobs
by sottol 1153 days ago
Classic attention is quadratic in context length and faster alternatives seem to not perform as well, wonder how Hyena is in comparison to linear attention algorithms.