Hacker News new | ask | show | jobs
by tempestn 4479 days ago
The other huge problem here is that Google's FeedFetcher doesn't respect robots.txt. (Their reasoning is that it is acting at the direct request of a human to retrieve a specific resource, so it doesn't count as a bot.) Because of this, there is no easy way to stop it from hitting your site.
1 comments

You can block the user agent, I believe "Feedfetcher-google" should work.
True, but (while possible) it's not straightforward to block access to specific files only. The same user agent is also used for Google Custom Search if you're using that. And it's still going to be hammering your firewall (although admittedly that's less catastrophic than trying to download a 10MB file repeatedly).