Hacker News new | ask | show | jobs
by argumentum 4470 days ago
> The issue is I want to respect their TOS.

Permitting certain search engines to crawl but not others is anticompetitive and violates the principle of an open web.

> any new search engine will have its roots in "shady" tactics of crawling.

Who cares? If you are successful enough, then you'll negotiate with them later. Don't worry about it now.

Crawl away!

3 comments

http://yelp.com/robots.txt

It seems you can contact Yelp and tell them how you plan to use their data and maybe they'll let you crawl their site.

I really want to explore any and all alternative options before I decide to crawl away :)

Are you hacker, or aren't you? Dumb rules exist to be broken.
Agreed but no harm in accessing the vast knowledge of the HN community to exhaust alternatives before tightening my hacker cap and plunging in head first.
Yelp is going after startups that crawl their site[1]. I don't want to make Yelp the poster site for this because other big sites do this too. [1]http://www.courthousenews.com/2012/01/27/43403.htm
Agreed. 99% of the time these big companies don't care. Also, use proxies.

if they come knocking on your door to take down the content, then worry about it.