Hacker News new | ask | show | jobs
by smt88 2634 days ago
Craigslist is famously aggressive about scraping, too. While helping with some university research, I found that their anti-scraping defenses were really difficult to get around.

They've also been litigious about this in the past, so this project would undoubtedly get a cease-and-desist if it ever became a blip on their radar.

2 comments

I'm guessing that's because of AirBnB and how they got their start by essentially stealing posts from Craigslist.
I thought Padmapper got their start that way, by essentially providing a better apartment hunt interface than CL had at the time.
Which was a shame, because padmapper stuck it on a map and linked to Craigslist, instead of just taking the content. If Google Maps did it nobody would think twice.
CraigsList might make them think twice. I don't think they want to overly exhaust their own resources, what they have works for them.
And they got the banhammer, too.

Not too hard to see where the fulcrum of this "settlement" is: https://www.eff.org/deeplinks/2015/06/padmapper-and-3taps-se...

> I found that their anti-scraping defenses were really difficult to get around

What sort of defenses are they using?

They had fantastic detection of headless Chrome and curl requests, a thorough IP blacklist, aggressive rate limiting, and possibly some JS stuff.
I was a contractor at a company where one of the divisions did Craiglist scraping as part of the business model and they had a closet full of laptops doing the scraping part of the job - thirdhand what I heard was separate PC's were necessary for the anti-scraping workaround they were using.