Hacker News new | ask | show | jobs
by wityl 868 days ago
They were previously parallelizable (via fft), but performed poorly on language modeling tasks.

Mamba adds a dependence on the inputs that makes language modeling competitive with transformers, but that prevents using the fft approach. So they switch to a method using parallel prefix scan.