| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by espadrine 845 days ago
	Another element is that Mamba required a very custom implementation down to custom fused kernels which I expect would need to be implemented in deepspeed or the equivalent library for a larger training run spanning thousands of GPUs.

1 comments

cs702 845 days ago

Not necessarily:

https://www.reddit.com/r/MachineLearning/comments/1amb3xu/d_...