Hacker News new | ask | show | jobs
by waisbrot 2539 days ago
I was hoping this would be about putting an orphan path in your robots.txt and then black-listing clients who tried to fetch it -- nobody should know about it except robots who are told not to go there, so anyone who visits the link is an adversary.
4 comments

As one small data point:

I've been running this experiment (another comment). While bots continuously hammer on port 22 (ssh), and repeatedly try to get things like /wp-* (I don't even run PHP), they don't bother fetching robots.txt in the first place, and my honeypot hasn't a single hit.

Definitely do not try to "secure" your site this way, but bots are either not sophisticated enough to analyze the .txt, or it might already be a known technique. Seems many other commenters come up with the same idea.

If you're an adversary trying to snoop on port 22, why would you bother to respect the conventions of robot.txt to begin with?
Not necessarily the same bot. And they're not snooping so much as brute-forcing default/common/random(?) usernames & passwords.
Funny experiment and perhaps also useful, but there are crawlers with good intentions[1] that still may ignore the disallows. I don't know of anyone else than the internet archive though.

[1] https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

Those crawlers can almost always be recognized by the UA.
Yes, no doubt.
Definitely an interesting idea, I should check the scene to figure out how many 'adversaries' are actually scanning robots.txt files.
Ooo that's a good idea, I will definitely start implementing this