Hacker News new | ask | show | jobs
by lubujackson 5171 days ago
Not true. It fetches the URL every single hour, not just when the user requests it. So Google is claiming they can ignore robots.txt because it was an action performed by a user (true) but they're unleashing a huge problem with this background refreshing. Google is wasting gobs of their own money, too. What if I made a bot that generated 1000s of Google accounts with 1000s of spreadsheets hotlinking 1000s of big files stored on S3? This one guy's one file did TERABYTES of transfers over a week. The underlying problem is that Google is relying on the domain name to indicate the company size, and thus the bandwidth allocation for this service.
1 comments

Background refreshing is a common feature in client applications, like RSS readers. I think their reasoning makes sense.

I do think they should change their process (making it lazy-load instead), but that's a different issue to robots.txt.