|
|
|
|
|
by moultano
5803 days ago
|
|
>Why aren't sites that scrape content blacklisted? The problem is more difficult than you'd think. For instance, virtually every news organization "scrapes" the associated press, but we wouldn't want to throw out every news organization. Content-free search result pages are things we do try to remove, even manually if it becomes a big enough problem. |
|
If they're not adding real value, like analysis or graphics or commentary or whatnot, why would you want to keep them if they're all just duplicates?
I had a friend work at a startup to solve this problem exact: we read virtually identical articles about the same bit of news on all the news sites. The startup was working on highlighting only the unique bits of each article and recommend the one article that seems to have the most pieces of information. You would read the one and skim to the unique bits of the others, and you would have gotten all angles and facts much more quickly.
Shame they closed it up.