Meanwhile over in .gov I’ve had to explain to a pentester that it wasn’t a security problem that robots.txt was accessible without authentication, based on a very big vendor’s scanner having badly regurgitated the OWASP advice.
This is common any time there’s so much demand: in the late 90s it was not uncommon to be in a room full of people who were ostensibly web developers and didn’t understand how the web or their backend servers worked but were certain they were about to become rich.
Security is especially bad because so many large organizations are under pressure to improve but the market is tight and the pool of experts is limited. Also, many places have outsourced to large contracting companies who don’t want to admit they don’t have enough qualified staff and will hope that you’ll be satisfied with whoever they deliver.
A few years ago I purposefully put a couple of "interesting" paths in the robots.txt as a honeypot to test/capture bot conformance and malicious actors. Not one hit ever.
A while back I wrote a Python script to watch for links posted on Twitter and then scrape their /robots.txt file [1]. The requests are routed through Tor for privacy purposes.
It's been incredibly enlightening. One thing that sticks out immediately is that you can identify the underlying HTTP framework in many cases due to the defaults. Sometimes even the exact version.
And, yes, people do use the robots file to "protect" or "hide" endpoints and they can effectively be used to enumerate potential endpoints worth investigating further (from a pentesting perspective).
Silly old me always starts with / in a browser. Then I click on links. Not all sites leak information like a sieve with the wire bit removed but many do. There is sometimes no need to do anything clever like look for robots.txt.
In addition to the obvious that is literally a list of places where admins don't want to look, it is also often useful in backend technology enumeration.