|
|
|
|
|
by thatguysaguy
846 days ago
|
|
Too new is definitely one thing. Someone is going to have to make a gamble to actually paying for a serious pretraining run with this architecture before we know how it really stacks up against transformers. There are some papers suggesting that transformers are better than SSMs in fundamental ways (e.g. They cannot do arbitrary key-based recall from their context: https://arxiv.org/abs/2402.01032). This means it's not just a no-brainer to switch over. |
|