Hacker News new | ask | show | jobs
by avinassh 3990 days ago
> Compute pair-wise Jaccard similarity of all articles with each other and output the articles whose title similarity is greater than 0.5.

If I understand correctly, it checks the similarity between titles. How effective it will be if it also checks similarities between the article contents? Sometimes, the article may not have similar looking titles, but talking about same thing.

Example: [0], [1], [2]

OT: You can use Python haxor to get articles from the HN [3]. Disclaimer: I wrote it.

[0] - http://www.bloomberg.com/news/articles/2015-07-17/google-app...

[1] - http://www.cnbc.com/2015/07/17/googles-one-day-rally-is-the-...

[2] - http://www.bbc.com/news/business-33572959

[3] - http://github.com/avinassh/haxor