| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tempestn 4526 days ago
	The other huge problem here is that Google's FeedFetcher doesn't respect robots.txt. (Their reasoning is that it is acting at the direct request of a human to retrieve a specific resource, so it doesn't count as a bot.) Because of this, there is no easy way to stop it from hitting your site.

1 comments

HNaTTY 4526 days ago

You can block the user agent, I believe "Feedfetcher-google" should work.

link

tempestn 4525 days ago

True, but (while possible) it's not straightforward to block access to specific files only. The same user agent is also used for Google Custom Search if you're using that. And it's still going to be hammering your firewall (although admittedly that's less catastrophic than trying to download a 10MB file repeatedly).

link