| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by _t89y 844 days ago

Thanks for this perspective on the tradeoff between accuracy and efficiency and the insight that an adequately pre-trained model should be in a position to recover lost information from bad tokens.

Tokenization, the gateway to word embeddings, is a means to an end. I'm not suggesting that better tokens are needed or that BPE tokens should be replaced with something else. I'm suggesting that aiming for a distributional semantics is setting the bar pretty low and that there are better places to end up than These Things Are Over Here And Those Things Are Over There Let's Combine Them And See What Happens. I'm expressing disbelief that these representations have been taken at face value and that there has been practically no discussion of applying alternative formalisms which may be more expressive.

Modeling language in a latent space only makes sense for certain aspects of language and certain kinds of analyses. Crucially, you have to have meaningful primitives to begin with. This line of thinking that an understanding of language and an understanding of the world is somehow going to emerge from mapping character spans onto a latent space and combining them with dot product attention is pretty half baked. These systems remain in Firth Mode™.