Hacker News new | ask | show | jobs
by dingaling 3611 days ago
Or someone else's website to which you now own the domain name
1 comments

It's so infuriating when I come across a domain squatter that nuked the entire history of a domain in the Wayback Machine. I sort of get why they have to do that but it also defeats most of the point of the Wayback Machine.
I sort of get why they have to do that

I don't. Can you explain?

Archive.org doesn't know the domain changed hands, just that it used to be allowed to show the results but now no longer is.
Doesn't really explain why they have to nuke it, even if it is the current site owner. Respecting robots.txt is one thing, but that just means not spidering and archiving the content that is now there. Deleting already archived material based on later changes to robots.txt is a non-obvious behavior, given the usual understanding of the general meaning of robots.txt.
They're not deleting it, just hiding it from public access. Once the squatter goes away, the content comes back.
What's the difference? Both make this feature (and more general use of the archive) useless.
Could they not also watch when a domain changes ownership and segment history based on the owner?

Over the time scales that archive.org holds on to data, domain ownership itself becomes part of the history. While permitting someone to hide a mistake for security reasons is reasonable, allowing erasure of past owners' history by the current owner is counter to their stated purpose.

No, because the WHOIS details on the domain can change without ownership actually having changed, in the not-uncommon case where a domain starts out registered by a founder or early employee and is later transferred to the company proper.

Given the prevalence bogus WHOIS data, the inverse is also possible: if the 2nd owner uses the same registrar and "privacy-protection" feature as the original owner, the WHOIS data could appear to have not changed, except for the start date of the registration, which would look identical to a single owner who re-registered their domain after allowing it to lapse.

You set up a website, which fails to do authorization properly. Accidentally you expose personal information about your employees which gets harvested by the archive project. You fix the website, but how do you remove the exposed information from the archives?
Have you tried contacting them about such domains?