| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by JoshTriplett 3839 days ago
	I think the archive.org crawler should respect robots.txt as it looked at the time of the crawl. As a well-behaved robot, archive.org's crawler should fetch and respect robots.txt each time it crawls. However, archive.org should not retroactively delete old content when the current site puts up a robots.txt. (To answer your other question, the robots.txt standard already allows giving different instructions to different crawlers.)