| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tptacek 2536 days ago
	There's a running joke among web pentesters about robots.txt being the first place you look when hitting a new site.

8 comments

acdha 2536 days ago

Meanwhile over in .gov I’ve had to explain to a pentester that it wasn’t a security problem that robots.txt was accessible without authentication, based on a very big vendor’s scanner having badly regurgitated the OWASP advice.

link

duxup 2536 days ago

The "security" world has an unusually high level of total incompetence. It is scary.

link

acdha 2536 days ago

This is common any time there’s so much demand: in the late 90s it was not uncommon to be in a room full of people who were ostensibly web developers and didn’t understand how the web or their backend servers worked but were certain they were about to become rich.

Security is especially bad because so many large organizations are under pressure to improve but the market is tight and the pool of experts is limited. Also, many places have outsourced to large contracting companies who don’t want to admit they don’t have enough qualified staff and will hope that you’ll be satisfied with whoever they deliver.

link

duxup 2536 days ago

Yeah no doubt it is a phase.

It's just a really nasty phrase right now.

I always think of this:

https://medium.com/@djhoulihan/no-panera-bread-doesnt-take-s...

link

sixplusone 2536 days ago

A few years ago I purposefully put a couple of "interesting" paths in the robots.txt as a honeypot to test/capture bot conformance and malicious actors. Not one hit ever.

link

0xDEFC0DE 2536 days ago

They just found a path further up and compromised you via that instead of bothering with the rest of the robots.txt :D

link

wybiral 2536 days ago

A while back I wrote a Python script to watch for links posted on Twitter and then scrape their /robots.txt file [1]. The requests are routed through Tor for privacy purposes.

It's been incredibly enlightening. One thing that sticks out immediately is that you can identify the underlying HTTP framework in many cases due to the defaults. Sometimes even the exact version.

And, yes, people do use the robots file to "protect" or "hide" endpoints and they can effectively be used to enumerate potential endpoints worth investigating further (from a pentesting perspective).

[1] https://gist.github.com/wybiral/20c20ccf00b6c93506b8acdc6ccb...

link

gerdesj 2536 days ago

Silly old me always starts with / in a browser. Then I click on links. Not all sites leak information like a sieve with the wire bit removed but many do. There is sometimes no need to do anything clever like look for robots.txt.

link

koolba 2536 days ago

It’s like walking through an office and seeing an unlocked door with a “Do not enter” sign.

link

stusmall 2536 days ago

In addition to the obvious that is literally a list of places where admins don't want to look, it is also often useful in backend technology enumeration.

link

jwarren116 2536 days ago

It’s very literally the second bullet point on my enumeration list for web apps, right behind looking at the DNS records for the domain.

link

leetbulb 2536 days ago

It's far from a joke.

link