Hacker News new | ask | show | jobs
by sevagh 915 days ago
Conda is the latest LLM cli frontend that's a MOE of Mistral 7B, LLama 17B, Falcon 32C, and the Yamaha YZ50 quad bike.
2 comments

Mamba is a PoC of the latest SSM architecture for LLMs named S6 and is a dense counterpart to Transformers trained for 300B tokens of the Pile in sizes up to 2.7B. Mamba proves that S6 LLMs train faster, run faster, use less VRAM, result in lower perplexity and better benchmark scores with the same exact training data.

That is actually accurate but probably sounds just as outlandish.

The approachable version is: Mamba is a proof of concept language model which showcases a new LLM architecture called S6 which is a competitor to the Transformer architecture (the 'T' in ChatGPT) and it is better in every measurable way.

> and the Yamaha YZ50 quad bike.

Well played.