|
|
|
|
|
by Aurornis
410 days ago
|
|
> that I don’t blame site owners for blocking all bots except googlebot. I doubt this is happening outside of a few small hobbyist websites where crawler traffic looks significant relative to human traffic. Even among those, it’s so common to move to static hosting with essentially zero cost and/or sign up for free tiers of CDNs that it’s just not worth it outside of edge cases like trying to host public-facing Gitlab instances with large projects. Even then, the ROI on setting up proper caching and rate limiting far outweighs the ROI on trying to play whack-a-mole with non-Google bots. Even if someone did go to all the lengths to try to block the majority of bots, I have a really hard time believing they wouldn’t take the extra 10 minutes to look up the other major crawlers and put those on the allow list, too. This whole argument about sites going to great lengths to block search indexers but then stopping just short of allowing a couple more of the well-known ones feels like mental gymnastics for a situation that doesn’t occur. |
|
That's not it. They're going to great lengths to block all bot traffic because of abusive and generally incompetent actors chewing through their resources. I'll cite that anubis has made the front page of HN several times within the past couple months. It is far from the first or only solution in that space, merely one of many alternatives to the solutions provided by centralized services such as cloudflare.