Hacker News new | ask | show | jobs
by AznHisoka 2532 days ago
I'd wager any startup that tries to crawl a few sites like Amazon, Yelp, Linkedin, etc will be blocked. Google, however gets a pass because they're Google. So yes, I believe their huge index, and ability to crawl any site at will is a huge, huge advantage for them.
2 comments

I built a search engine that was able to crawl Amazon and Yelp. The toughest sites were reddit and facebook.
at scale? millions of pages a week? And now? I wrote a crawler that could crawl Amazon as early as a year ago too, but now it doesn't work.
And google sucks at those too.
Amazon lets anyone crawl them, Yelp has a whitelist and no you can't get on it, Linkedin has a whitelist and no you can't get on it, Facebook has a whitelist and no you can't get on it.