Hacker News new | ask | show | jobs
by srush 854 days ago
Hi! Blog author. This was an attempt a couple years ago to understand and write about this paper in a detailed way. Here is a video going through this topic as well: https://youtu.be/dKJEpOtVgXc?si=PDNO0B0qi6ARHaeb

Section 2 of the blog post is no longer very relevant. A lot of advances (DSS, S4D) simplified that part of the process. Arguably also this all should be updated for Mamba (same authors).

2 comments

Thanks for your spectacular resources! I see that you began an Annotated Mamba repository -- any chance you could share when that blog page might go live?
This was an excellent write up thanks. It'll help me understand the Mamba work a lot more.

I still find it really confusing how a linear model can perform so well.