Hacker News new | ask | show | jobs
by cs702 1057 days ago
...from the same team that brought you FlashAttention, S4, H3, and Hyena.

As always, we have to wait until this has been tested at much larger scale.

1 comments

are those good or bad
FlashAttention is an amazing improvement over the previous state of the art. The others are still highly experimental, but seem like they'll at least contribute significant knowledge to whatever ends up surpassing the Transformer, (assuming something does).