Hacker News new | ask | show | jobs
by AznHisoka 3589 days ago
This is why I think Google's position as the #1 search engine will never go away. Many sites will tell your bot to go away if you're not Google. They don't care if you're building a search engine that will compete with Google.
1 comments

At blekko, we did not find this issue to be a significant one... almost everyone who banned our crawler was a crappy over-SEOed website.
https://www.linkedin.com/robots.txt

https://yelp.com/robots.txt

There goes all Linkedin + Yelp content from your index.

What about https://www.facebook.com/robots.txt

..and medium-sized/small sites are even worse.

The irony of Facebook being a core part of all NSA surveillance programs and their terms of service including their "Automated Data Collection Terms" https://www.facebook.com/apps/site_scraping_tos_terms.php

If you surf LinkedIn logged out, you'll see that there isn't very much information available anyway. And there's no money in people search.

Yelp was very responsive when blekko wrote them; as you can see ScoutJet has the same access as googlebot.