| HN Mirror

As I understand it, no one has come close to the claimed results in s3.1 ("Sentiment Analysis with the Stanford Sentiment Treebank Dataset") and people have come closer but still not matched those in s3.2 ("Beyond One Sentence: Sentiment Analysis with IMDB dataset").

A thread where Mikolov is trying to help other with his patch to `word2vec.c` demoing a tiny bit of 'Paragraph Vector' – but reaches the limits of what he understands Le to have done – is: https://groups.google.com/g/word2vec-toolkit/c/Q49FIrNOQRo/m...

My own frustration (& reco to avoid the thin stanfordSentimentTreebank/RottenTomatoes data/results) is mentioned at: https://groups.google.com/g/word2vec-toolkit/c/ubFrO0a9Pe8/m...

I'd say that "concatenating PV-DBOW and (plain averaging) PV-DM" never seems to offer much lift compared the favorable way it's described in the paper and other Le comments. And after spending a bunch of time implementing the "PV-DM with concatenation (rather than sum/average) of many word-vectors as the context", as I interpret the paper's description "To predict the 8-th word, we concatenate the paragraph vectors and 7 word vectors", I've only seen it massively increase model size & training time for very little advantage.

Reddit searches on related topics turn up other anecdotes/resentments; eg:

https://www.reddit.com/r/MachineLearning/comments/18jzxpf/co...

https://www.reddit.com/r/MachineLearning/comments/hkiyir/com...

At a certain level, with so much "gold in them thar hills" in followup work – from both academic & commercial perspectives, I don't blame Le for rushing forward to other related fertile ideas, & ignoring the requests-for-explanation.

But there's something sloppy or fishy (hiding secret tweaks?) in the originally-claimed PV results, which has wasted a lot of time among those trying to understand & reproduce.