Hacker News new | ask | show | jobs
by foob 2559 days ago
This is a polite thing to do, but I don't think that there is any legal precedence for it being an actual requirement. Notably, both Apple and The Wayback Machine publicly disregard robots.txt files [1]. I would be very curious to read any court ruling that determined a robots.txt file needs to be respected.

[1] - https://intoli.com/blog/analyzing-one-million-robots-txt-fil...

2 comments

It depends on the intention. You should respect robots.txt for search indexing, for example, but not necessarily for something like archiving or creating alternative page layouts (e.g outline/reader view).
Wayback machine does look at robots.txt - https://help.archive.org/hc/en-us/articles/360004651732-Usin...
They look at them, but they don't follow them strictly [1]. They make judgement calls on what they should do rather than treating robots.txt files as a legal contract.

[1] - https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...