Hacker News new | ask | show | jobs
by nl 4226 days ago
The paper compares in detail against word2vec, but (spoiler alert) GloVe using 42 billion tokens from Common Crawl beats word2vec using 100 billion tokens from the Google News corpus!

Damn!!

Background for those who don't follow this field: Word2Vec is an apparently miraculous demonstration and poster-child of the unreasonable effectiveness of big data. Beating it at all is impressive, assuming the performance is as robust as Word2Vec is against different metrics.

Beating it with only 42% of the tokens is wondrous.