|
|
|
|
|
by D0TheMath
912 days ago
|
|
The jury very much still seems out on this. Computationally speaking, I believe Mamba is Turing complete, while transformers aren't (they can't do loops), so technically Mamba is more expressive. But of course, the question is always whether it ends up with lower loss. |
|