Hacker News new | ask | show | jobs
by JoshTriplett 3839 days ago
I think the archive.org crawler should respect robots.txt as it looked at the time of the crawl. As a well-behaved robot, archive.org's crawler should fetch and respect robots.txt each time it crawls. However, archive.org should not retroactively delete old content when the current site puts up a robots.txt.

(To answer your other question, the robots.txt standard already allows giving different instructions to different crawlers.)