| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by laretluval 3112 days ago
	word2vec has the advantage that you could potentially identify spam messages that are paraphrases rather than exact copies of the ones in the training set.

1 comments

mci 3112 days ago

1. Pedantically: it's GloVe, not word2vec. 2. Nilsimsa or any locality-sensitive hash detect changed messages, too, be the changes synonyms or not. 3. I don't think OP's GloVe contains words like v1agra.

link

doody_parizada 3112 days ago

We don't have words like v1agra. As I mentioned in the README, we took vectors pretrained on wikipedia. One of the possible improvements can be to train the vectors on our own dataset.

link