Hacker News new | ask | show | jobs
by pierrefar 5808 days ago
virtually every news organization "scrapes" the associated press

If they're not adding real value, like analysis or graphics or commentary or whatnot, why would you want to keep them if they're all just duplicates?

I had a friend work at a startup to solve this problem exact: we read virtually identical articles about the same bit of news on all the news sites. The startup was working on highlighting only the unique bits of each article and recommend the one article that seems to have the most pieces of information. You would read the one and skim to the unique bits of the others, and you would have gotten all angles and facts much more quickly.

Shame they closed it up.

1 comments

We do filter near-duplicates within the same set of results. You'll likely see only one copy of an AP story with a link at the bottom saying something like "Repeat this search with the omitted results included"