|
|
|
|
|
by Der_Einzige
844 days ago
|
|
First it was longformer, and linear attention models. Then it was RWKV and now it's Mamba. So many bombastic claims of improved architectural performance - and no open source models that beat the thing they purport to beat. The proof is always in the pudding, and these models will remain a curiosity for most until their weights are being benchmarked favorably on LLM leaderboards. |
|
In that context, all new research directions are valuable simply for the fact that they're expanding the foundation of the field. 5 years from now, who knows what the most effective models will use under the hood, but the more we can learn about them in general, the better.