Hacker News new | ask | show | jobs
by spxneo 806 days ago
im not smart enough to know the significance of this...is Griffin like MAMBA?
1 comments

Yes, like RWKV and Mamba this is a new generation of models that are more like big RNNs than pure transformers we have now
Isn't that how previous models were, before the attention is all you need paper?
and is Griffin a state space model?
No, it's a combination of RNN and Transformer.
I mean, SSMs are in fact under the hood RNNs
At the end of the day, either you carry around a hidden state, or you have a fixed window for autoregression.

You can call hidden states "RNN-like" and autoregressive windows "transformer-like", but apart from those two core paradigms I don't know of other ways to do sequence modelling.

Mamba/RWKV/Griffin are somewhere between those two extremes.