Hacker News new | ask | show | jobs
by MrJohz 430 days ago
Fwiw, I run a pretty tiny site and see relatively minimal traffic coming from bots. Most of the bot traffic, when it appears, is vulnerability scanners (the /wp-admin/ requests on a static site), and has little impact on my overall stats.

My guess is that these tools tend to be targeted at mid-sized sites — the sorts of places that are large enough to have useful content, but small enough that there probably won't be any significant repercussions, and where the ops team is small enough (or plain nonexistent) that there's not going to be much in the way of blocks. That's why a site like SourceHut gets hit quite badly, but smaller blogs stay largely out of the way.

But that's just a working theory without much evidence trying to justify why I'm hearing so many people talking about struggling with AI bot traffic and not seeing it myself.

2 comments

Well, we just spun up anubis in front of a two user private (as in publicly accessible but with almost all content set to private/login protected) forgejo instance after it started getting hammered (mostly by amazon ips presenting as amazonbot) earlier in the week, resulting in a >90% traffic reduction. From what we’ve seen (and Xe’s own posts) it seems git forges are getting hit harder than most other sites, though, so YMMV i guess.
I actually have a theory, based on the last episode of the 2.5 admins podcast. Try spinning up a MediaWiki site. I have a feeling that wiki installation are being targeted to a much higher degree. You could also do a Git repo of some sort. Either two could give the impression that content is changed frequently.
yep, I'm running a pretty sizeable game Wiki and it's being scraped to hell with very specific urls that pretty much guarantees cache busting. (usually revision ids and diffs)
I could believe that. Plus, because both of those are more dynamic, they're going to have to do more work per request anyway, meaning the effects of scraping are exacerbated.