Hacker News new | ask | show | jobs
by crazygringo 1406 days ago
You don't need to be snarky.

Robots.txt isn't for hiding/suppressing information.

Often times you can have whole URL structures that are redundant with other ones, mainly database-generated pages with all sorts of possible query parameters often disguised as paths. Robots.txt is extremely useful in ensuring crawlers can make life easier for themselves by limiting to the "real" content, as opposed to the redundant stuff. Crawling the 5,000 real pages, not the 500,000 additional URL's that return the same content.

Also for ignoring "interactive" pages like login pages that make zero sense to be crawled.

People "give a crap" about robots.txt because it's useful for that.