|
|
|
|
|
by _t89y
854 days ago
|
|
It’s pretty wild how little discussion there's been about the core feature of these models. It's as if this aspect of their development has been solved. Basically all NLP publications today take these BPE tokens as a starting point and if they are mentioned at all they’re mentioned in passing. |
|
And while it's a core feature, it's a fairly robust one, while you can get some targeted improvements, the default option(s) are good enough and you won't improve much over them.