|
|
|
|
|
by moffkalast
845 days ago
|
|
If I'm not mistaken the largest mamba model right now is 2.8B and undertrained with low quality data (the Pile only). The main problem is that it's new and unproven. Should become very interesting once someone with both data and significant financial backing takes the plunge and trains something of notable size. Perhaps Llama-3 might already end up being that attempt, as we seem to be heavily into diminishing returns for transformers. |
|
> low quality data (the Pile only)
The Pile is pretty good quality wise. It's mostly the size (300B tokens) that's limiting.
[1]: https://huggingface.co/state-spaces/mamba-2.8b-slimpj [2]: https://huggingface.co/stabilityai/stablelm-3b-4e1t