Hacker News new | ask | show | jobs
by rev_bird 4005 days ago
You're making it sound like the scraping was being done to replace the functionality of CL, which, yeah, would be pretty transparently shitty to do. But they weren't doing that, especially PadMapper: They were indexing the content to make it more accessible, an action that's been taken probably trillions of times and is pretty much the main reason most of the internet is even usuable today. It's like accusing Google of plagiarizing your website because they linked to it.
1 comments

That's a good point - does craigslist have a robots.txt to prevent Google from crawling it? If not, isn't Google guilty of the very same thing, by aggregating the information via search results?
Craigslist doesn't prevent Google from crawling them. Not only that, Craigslist also sued at least one company for scraping Google results in order to index Craigslist postings.
> User-agent: * > Disallow: /reply > Disallow: /fb/ > Disallow: /suggest > Disallow: /flag > Disallow: /mf > Disallow: /eaf

Nothing blocking listings... OR PadMapper...