| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cubefox 139 days ago

Sounds interesting, but...

> these models dominate both exponential attention and linear attention at long-context training

There is no exponential attention; standard attention is quadratic. Strange mistake.