Hacker News new | ask | show | jobs
by laretluval 3112 days ago
word2vec has the advantage that you could potentially identify spam messages that are paraphrases rather than exact copies of the ones in the training set.
1 comments

1. Pedantically: it's GloVe, not word2vec. 2. Nilsimsa or any locality-sensitive hash detect changed messages, too, be the changes synonyms or not. 3. I don't think OP's GloVe contains words like v1agra.
We don't have words like v1agra. As I mentioned in the README, we took vectors pretrained on wikipedia. One of the possible improvements can be to train the vectors on our own dataset.