Hacker News new | ask | show | jobs
by lukebechtel 331 days ago
It's meant to replace the BPE tokenizer piece, so it isn't a full Language Model by itself.

In fact in Gu's blog post (linked in a post below) it's mentioned that they created a Mamba model that used this in place of the tokenizer.