|
|
|
|
|
by PeterisP
854 days ago
|
|
It makes sense - publications write about the things they added, changed or evaluated, not about all the (many!) things they do exactly as everyone else; so tokenization would be mentioned only if the publication is explicitly about a different tokenization. And while it's a core feature, it's a fairly robust one, while you can get some targeted improvements, the default option(s) are good enough and you won't improve much over them. |
|
That's my first point. In 10 years we have word2vec, GloVe, GPT-2 and... tiktoken. lol. It's as if directional, numeric magnitudes in an embedding space of arbitrary dimensionality have magically captured or will magically capture the nuances and expressivity of language. Optimization techniques and new strategies for domain adaption are what matters, particularly for mobile devices, on-device ASR and short-form videos.
I don't think robust is a good characterization of clusters of semantic attributes in space or a distributional semantics of language. I'd say crude and without understanding are more accurate descriptions. Capturing semantic properties sometimes is not the same thing as having a semantics.
By targeted improvements you must be referring to domain adaptation and by the default option you must be referring to attention over BPE tokens? You can move directional quantities around in directional quantity space all day. If it results in expected behavior for your application that you weren't getting before that's great. If that's all you want to get out of these models then indeed there's nothing to do here. I'm not after improvements so much as I'm after something that works.