Hacker News new | ask | show | jobs
by HarHarVeryFunny 781 days ago
If Mamba really was as capable as a Transformer on tasks requiring accurate attending to long context, then there'd be no need for Jamba (Mamba+Transformer hybrid).

Your argument of "if we train a Mamba SSM to be as good as a Transformer, then it'll be as good as a Transformer", seems a tad circular...