Hacker News new | ask | show | jobs
by mlthoughts2018 1993 days ago
The paper itself says the only change is normalizing by the context window size C.
1 comments

Ah, but I've now looked at their code, and it's not the only change! They've also eliminated the `reduced_window` method of weighting-by-distance that's present in `word2vec.c`, Gensim, and FastText.

What if that's the real reason for their sometimes slightly-better, sometimes slightly-worse performance on some benchmarks? Perhaps there are other changes, too.

This is why I continue to think Gensim's policy of matching the reference implementations from the original authors, at least by default, is usually the best policy – rather than using an alternate interpretation of the often-underspecified papers.