Hacker News new | ask | show | jobs
by slrz 3611 days ago
I sort of get why they have to do that

I don't. Can you explain?

2 comments

Archive.org doesn't know the domain changed hands, just that it used to be allowed to show the results but now no longer is.
Doesn't really explain why they have to nuke it, even if it is the current site owner. Respecting robots.txt is one thing, but that just means not spidering and archiving the content that is now there. Deleting already archived material based on later changes to robots.txt is a non-obvious behavior, given the usual understanding of the general meaning of robots.txt.
They're not deleting it, just hiding it from public access. Once the squatter goes away, the content comes back.
What's the difference? Both make this feature (and more general use of the archive) useless.
The difference is exactly what I said: if they deleted it, it's gone forever. If they hide it, it can come back. I've seen pages I cite disappear for a year or two thanks to scummy squatters - but they came back! It's the difference between being sentenced to execution and to 1 year of prison.
Could they not also watch when a domain changes ownership and segment history based on the owner?

Over the time scales that archive.org holds on to data, domain ownership itself becomes part of the history. While permitting someone to hide a mistake for security reasons is reasonable, allowing erasure of past owners' history by the current owner is counter to their stated purpose.

No, because the WHOIS details on the domain can change without ownership actually having changed, in the not-uncommon case where a domain starts out registered by a founder or early employee and is later transferred to the company proper.

Given the prevalence bogus WHOIS data, the inverse is also possible: if the 2nd owner uses the same registrar and "privacy-protection" feature as the original owner, the WHOIS data could appear to have not changed, except for the start date of the registration, which would look identical to a single owner who re-registered their domain after allowing it to lapse.

You set up a website, which fails to do authorization properly. Accidentally you expose personal information about your employees which gets harvested by the archive project. You fix the website, but how do you remove the exposed information from the archives?