Y
Hacker News
new
|
ask
|
show
|
jobs
by
hughw
3966 days ago
Would archive.org typically honor a robots.txt for a resource it already retrieved? I never understood the intent of a robots.txt to be retroactive.
3 comments
mikeash
3966 days ago
Apparently yes, it would:
https://archive.org/about/exclude.php
link
syncsynchalt
3965 days ago
My understanding is that sites like archive.org honor robots.txt retroactively not because they are required to, but to best honor the wishes of the content provider.
link
X-Istence
3966 days ago
Yes, it simply hides the content, it is still kept in their database so if the robots.txt disappears, it pops back from their archive.
New pages won't be archived though.
link