Hacker News new | ask | show | jobs
by dTal 2538 days ago
Honestly, I think it couldn't hurt, if done appropriately. If crawlers are indexing those pages, then they're publicly available anyway, and could be crawled by a determined attacker - so nothing in robots.txt ought to be truly sensitive. But if there's pages that ought to be secure, but might contain an exploitable vulnerability, putting their path in robots.txt at least limits their exposure to those determined enough to look, rather than any lazy script kiddie using Google to search your site.

Obviously you shouldn't rely on it, but defense in depth as always.

2 comments

If you want that as an additional safeguard, set the noindex header on that path at your edge so you’re not calling attention to it:

https://developers.google.com/search/reference/robots_meta_t...

I’d also strongly recommend pairing this with outside monitoring which alerts if something accidentally becomes reachable since it’s really easy not to notice something working from more places than intended.

But then archive.org will ignore it (but point the crawler at the directories you helpfully linked it to) and those caches will show up in Google