Hacker News new | ask | show | jobs
by Bill_Dimm 4113 days ago
Good observation -- I missed that (obviously). They seem to be using data from the word2vec project, so I would guess that it is intentional rather than a lack of cleaning.