| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wityl 868 days ago
	They were previously parallelizable (via fft), but performed poorly on language modeling tasks. Mamba adds a dependence on the inputs that makes language modeling competitive with transformers, but that prevents using the fft approach. So they switch to a method using parallel prefix scan.