|
|
|
|
|
by ThePinion
382 days ago
|
|
Can you further elaborate on this robots.txt? I was under the impression most AI just completely ignores anything to do with robots.txt so you may just be hitting the ones that are maybe attempting to obey it? I'm not against the idea like others here seem to be, I'm more curious about implementing it without harming good actors. |
|
But in my experience it isn't the robots.txt violations being so flagrant (half the requests are probably humans who were curious what you're hiding, and most bots written specifically for LLMs don't even check the robots.txt). The real abuse is the crawler that hits an expensive and frequently-changing URL more often than reasonable, and the card-testers hitting payment endpoints, sometimes with excessive chargebacks. And port-scanners, but those are a minor annoyance if your network setup is decent. And email spoofers who bring your server's reputation down if you don't set things up correctly early on and whenever changing hosts.