Hacker News new | ask | show | jobs
by MacsHeadroom 914 days ago
Mamba is a PoC of the latest SSM architecture for LLMs named S6 and is a dense counterpart to Transformers trained for 300B tokens of the Pile in sizes up to 2.7B. Mamba proves that S6 LLMs train faster, run faster, use less VRAM, result in lower perplexity and better benchmark scores with the same exact training data.

That is actually accurate but probably sounds just as outlandish.

The approachable version is: Mamba is a proof of concept language model which showcases a new LLM architecture called S6 which is a competitor to the Transformer architecture (the 'T' in ChatGPT) and it is better in every measurable way.