| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rev_bird 4005 days ago
	You're making it sound like the scraping was being done to replace the functionality of CL, which, yeah, would be pretty transparently shitty to do. But they weren't doing that, especially PadMapper: They were indexing the content to make it more accessible, an action that's been taken probably trillions of times and is pretty much the main reason most of the internet is even usuable today. It's like accusing Google of plagiarizing your website because they linked to it.

1 comments

mark-r 4005 days ago

That's a good point - does craigslist have a robots.txt to prevent Google from crawling it? If not, isn't Google guilty of the very same thing, by aggregating the information via search results?

link

makomk 4005 days ago

Craigslist doesn't prevent Google from crawling them. Not only that, Craigslist also sued at least one company for scraping Google results in order to index Craigslist postings.

link

rev_bird 4005 days ago

> User-agent: * > Disallow: /reply > Disallow: /fb/ > Disallow: /suggest > Disallow: /flag > Disallow: /mf > Disallow: /eaf

Nothing blocking listings... OR PadMapper...

link