|
|
|
|
|
by blueboo
911 days ago
|
|
This is interesting. I went off and searched for paragraph vector code and indeed find doc2vec stuff, including tutorials referring to the paper such as https://radimrehurek.com/gensim/auto_examples/howtos/run_doc.... It’s not obvious that the results aren’t reproducible (and I realise code is not the same as published results), but I wonder if you could steer us more specifically. |
|
A thread where Mikolov is trying to help other with his patch to `word2vec.c` demoing a tiny bit of 'Paragraph Vector' – but reaches the limits of what he understands Le to have done – is: https://groups.google.com/g/word2vec-toolkit/c/Q49FIrNOQRo/m...
My own frustration (& reco to avoid the thin stanfordSentimentTreebank/RottenTomatoes data/results) is mentioned at: https://groups.google.com/g/word2vec-toolkit/c/ubFrO0a9Pe8/m...
I'd say that "concatenating PV-DBOW and (plain averaging) PV-DM" never seems to offer much lift compared the favorable way it's described in the paper and other Le comments. And after spending a bunch of time implementing the "PV-DM with concatenation (rather than sum/average) of many word-vectors as the context", as I interpret the paper's description "To predict the 8-th word, we concatenate the paragraph vectors and 7 word vectors", I've only seen it massively increase model size & training time for very little advantage.
Reddit searches on related topics turn up other anecdotes/resentments; eg:
https://www.reddit.com/r/MachineLearning/comments/18jzxpf/co...
https://www.reddit.com/r/MachineLearning/comments/hkiyir/com...
At a certain level, with so much "gold in them thar hills" in followup work – from both academic & commercial perspectives, I don't blame Le for rushing forward to other related fertile ideas, & ignoring the requests-for-explanation.
But there's something sloppy or fishy (hiding secret tweaks?) in the originally-claimed PV results, which has wasted a lot of time among those trying to understand & reproduce.