Hacker News new | ask | show | jobs
by hinkley 542 days ago
Web hosting site I worked for had way too much of its traffic from Google spidering. Vanity URLs make them hit you like a ton of bricks. If you throttle them it reduces your score. If you 429 them that seems to be even worse. Canonical URLs don’t really save you, because they have to load the response to see they’ve already visited (just saves your score not traffic), caching can help somewhat, but if your static pages meant for bots differ too much from the real page then that’s the worst offense of all. So you need internal caching and at the end of the day you just have to scale up to deal with their bullshit. Given that Google sells cloud services, this now looks like a protection racket. Would be a shame if something happened to your website…