Y
Hacker News
new
|
ask
|
show
|
jobs
by
hiker512
3030 days ago
Can you recommend an implementation or paper for handling OOV words via character level embeddings?
2 comments
patelajay285
3030 days ago
We just open sourced a easy to use library called Magnitude that handles out-of-vocabulary words and uses Annoy indexing for fast most_similar queries for word2vec, GloVE, and fastText:
https://github.com/plasticityai/magnitude
link
Sinidir
3026 days ago
Thank you very much. This might help with my Master Thesis :)
link
visarga
3030 days ago
Great work!
link
fnl
3029 days ago
I agree - and also to your initial comments on the actual benefits & limitations of word vectors, that I can 100% subscribe to.
link
wenc
3030 days ago
Just an FYI, FastText's default implementation handles OOV words via word n-grams and character n-grams. (see switches -minn, -maxn, and -wordNgram)
https://fasttext.cc/docs/en/options.html
link
https://github.com/plasticityai/magnitude