|
|
|
|
|
by Spivak
432 days ago
|
|
You have it exactly right sans the reason to allow them in the first place. They're bots that provide reciprocal value to the site owner. Otherwise why even bother letting them through. It's wild how people don't get that facebook and googlebot gets let through paywalls and such because they bring the site real tangible revenue. If you want to get the same privileges you have to start with the monetary value provided to the sites you index. Lead gen is hard and major search engines provide crazy value for next to nothing. |
|
Search bots, and specially Google, provide my site a lot of value. They respect the robots.txt, I can see that about half my visits come from search, they identify properly as bots. It's almost impossible to notice a search bot in the graphs.
But AI bots suck. They don't even read the robots.txt, they hit the site as hard as it can hold, when they receive a 5xx, a 444 or a 426 they interpret it as "keep requesting hard until you get a 200", they can easily DoS or bankrupt a small site, they use fake user agents. As the OP post shows, their activity can be clearly seen in the log graphs as huge spikes coming from a single client. OpenAI scanned 100% of one of my sites (more than 20,000 individual pages) in two days causing intermitent DoS, while the Google is at 80% of the sitemap.xml. And cherry on top, I still can't see a single visit in my logs that come from their services.
I think you might be confusing search bots with AI bots.