Hacker News new | ask | show | jobs
by swimwiththebeat 839 days ago
Does anyone know if this is using the Mamba architecture[1] instead of transformers? It looks like it uses a state space model (SSM) layer.

[1]: https://arxiv.org/abs/2312.00752

2 comments

We covered state space models in a blog post here - https://blog.dragonscale.ai/state-space-models/

It gives overview of Mamba And StrypedHyna.

It came earlier than Mamba. It uses hyena hierarchy blocks, which are considered SSM but not the same as Mamba.