|
|
|
|
|
by VHRanger
803 days ago
|
|
At the end of the day, either you carry around a hidden state, or you have a fixed window for autoregression. You can call hidden states "RNN-like" and autoregressive windows "transformer-like", but apart from those two core paradigms I don't know of other ways to do sequence modelling. Mamba/RWKV/Griffin are somewhere between those two extremes. |
|