|
|
|
|
|
by thesz
932 days ago
|
|
There are character embeddings that allow one to recover word embedding just by summing embeddings of individual bytes/chars in the word: https://github.com/sonlamho/Char2Vec The encodings of LM's tokens reserve individual characters so that scrambled or new words can be encoded. And most LM's are trained on scrambled words as part of training copus, thus, they learn character-level embeddings. Thus, basically, the paper is a very old news. This behavior is expected. |
|