| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lpasselin 843 days ago
	The mamba paper shows significant improvements in all model sizes, up to 1b, the largest one tested. Are there any reason why it wouldn't scale to 7b or more? Have they tried it?

1 comments

samus 843 days ago

That's the issue - I keep hearing that it is beyond small research group's budget to meaningfully train such a large model. You don't just need GPU time, you also need data. And just using the dregs of the internet doesn't cut it.

link