| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by intalentive 845 days ago

Nice post. A couple things to add:

1. The Mamba co-author was also the FlashAttention lead author.

2. The secret ingredient that makes SSMs viable for deep learning is HiPPO theory. If you start with random initialization you're not going to get results. What you need is "optimal online function approximation" using Legendre polynomials, a Fourier basis, etc., in matrix form. The Mamba story starts with Legendre Memory Units.

Invariably someone comments, "How do we know that it scales?" We don't. But the lead author has backing and a new startup at cartesia.ai. Could be the next Mistral.

1 comments

sigmoid10 845 days ago

The architecture is completely public. I would be surprised if certain other players (including but not limited to Mistral AI) are not training models yet. We'll hear soon enough if this is viable. Maybe not for official release candidates, but at least for internal testing.

link

3abiton 845 days ago

Nonetheless, this is extremely exciting, unlike RWKV and Retention Network

link

cztomsik 844 days ago

Why? From what I read those architectures have many similarities (and same weaknesses)

link