|
|
|
|
|
by suicas
1912 days ago
|
|
A company I worked for ~7 years ago ran its own focused web crawler (fetching ~10-100m pages per month, targeting certain sections of the web). There were a surprising number of sites out there that explicitly blocked access to anyone but Google/Bing at the time. We'd also get a dozen complaints or so a month from sites we'd crawled. Mostly upset about us using up their bandwidth, and telling us that only Google was allowed to crawl them (though having no robots.txt configured to say so). |
|
If Google is taking traffic and reducing revenue, a company can deny in robots.txt. Google will actually follow those rules - unlike most others that are supposedly in this 2nd class.