Hacker News new | ask | show | jobs
by hiker512 3030 days ago
Can you recommend an implementation or paper for handling OOV words via character level embeddings?
2 comments

We just open sourced a easy to use library called Magnitude that handles out-of-vocabulary words and uses Annoy indexing for fast most_similar queries for word2vec, GloVE, and fastText:

https://github.com/plasticityai/magnitude

Thank you very much. This might help with my Master Thesis :)
Great work!
I agree - and also to your initial comments on the actual benefits & limitations of word vectors, that I can 100% subscribe to.
Just an FYI, FastText's default implementation handles OOV words via word n-grams and character n-grams. (see switches -minn, -maxn, and -wordNgram)

https://fasttext.cc/docs/en/options.html