Hacker News new | ask | show | jobs
by Spare_account 3351 days ago
Here's a slight modification to the GP proposal:

- Respect robots.txt at the time you crawl it.

- If robots.txt appears later, stop archiving from that date forwards.

- Preserve access to old archived copies of the site by default.

- Offer a mechanism that allows a proven site owner to explicitly request retrospective access removal.

If archive.org have recorded the date that they first observed a robots.txt on the sites currently unavailable, they could even consider applying the above logic today retrospectively. Perhaps after a couple of warning emails to the current Administrative Contact for the domain.

1 comments

>mechanism that allows a proven site owner to explicitly request retrospective access removal. //

It should be "a proven content owner", just buying a site shouldn't allow someone to remove it from archive.