https://yelp.com/robots.txt
There goes all Linkedin + Yelp content from your index.
..and medium-sized/small sites are even worse.
The irony of Facebook being a core part of all NSA surveillance programs and their terms of service including their "Automated Data Collection Terms" https://www.facebook.com/apps/site_scraping_tos_terms.php
Yelp was very responsive when blekko wrote them; as you can see ScoutJet has the same access as googlebot.
https://yelp.com/robots.txt
There goes all Linkedin + Yelp content from your index.