Hacker News new | ask | show | jobs
by opless 3966 days ago
Just in case a robot.txt kills that

http://pastebin.com/rcPSyRnR

2 comments

It's also on seclists.org -

http://seclists.org/isn/2015/Aug/4

Would archive.org typically honor a robots.txt for a resource it already retrieved? I never understood the intent of a robots.txt to be retroactive.
Apparently yes, it would: https://archive.org/about/exclude.php
My understanding is that sites like archive.org honor robots.txt retroactively not because they are required to, but to best honor the wishes of the content provider.
Yes, it simply hides the content, it is still kept in their database so if the robots.txt disappears, it pops back from their archive.

New pages won't be archived though.