Hacker News new | ask | show | jobs
by PoignardAzur 617 days ago
For the amount of theoretical work behind those Mamba2 blocks (I can barely understand their paper on the subject), those are some extremely modest performance gains.

Attention remains king.

2 comments

> I can barely understand their paper on the subject

Yannic Kilcher has a new video touching on Mamba in an intuitive way.

https://www.youtube.com/watch?v=jE9jAZC42NE

Mamba is also much more efficient, watt-wise, to run.