Hacker News new | ask | show | jobs
by Der_Einzige 846 days ago
Very annoying namespace conflict since a package called "mamba" (faster reimplementation of the python conda package manager) already existed for awhile before this architecture was even dreamed up.

https://github.com/mamba-org/mamba

Beyond that, I'll care about an alternative to transformers when it shows superior performance with an open source 7b-34b model compared to transformer model competitors. So far this has not happened yet

3 comments

> Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting.
The mamba paper shows significant improvements in all model sizes, up to 1b, the largest one tested.

Are there any reason why it wouldn't scale to 7b or more? Have they tried it?

That's the issue - I keep hearing that it is beyond small research group's budget to meaningfully train such a large model. You don't just need GPU time, you also need data. And just using the dregs of the internet doesn't cut it.
I use the former and have been experimenting with the latter. Fortunately, the contexts are separate enough that they never come up in the same sentence.
I was using mamba to install mamba the other day, when suddenly I had to run for a live mamba.
While chewing a Mamba?

https://www.mamba.us/

I had the exact same experience, and I was also using it for a web application powered by the Mamba web framework.