Hacker News new | ask | show | jobs
by J_Darnley 3611 days ago
You don't have to request anything. Just alter robots.txt and you can make the Wayback Machine memory hole your entire website.
1 comments

The keyword here is 'your'.
Or someone else's website to which you now own the domain name
It's so infuriating when I come across a domain squatter that nuked the entire history of a domain in the Wayback Machine. I sort of get why they have to do that but it also defeats most of the point of the Wayback Machine.
I sort of get why they have to do that

I don't. Can you explain?

Archive.org doesn't know the domain changed hands, just that it used to be allowed to show the results but now no longer is.
Doesn't really explain why they have to nuke it, even if it is the current site owner. Respecting robots.txt is one thing, but that just means not spidering and archiving the content that is now there. Deleting already archived material based on later changes to robots.txt is a non-obvious behavior, given the usual understanding of the general meaning of robots.txt.
Could they not also watch when a domain changes ownership and segment history based on the owner?

Over the time scales that archive.org holds on to data, domain ownership itself becomes part of the history. While permitting someone to hide a mistake for security reasons is reasonable, allowing erasure of past owners' history by the current owner is counter to their stated purpose.

You set up a website, which fails to do authorization properly. Accidentally you expose personal information about your employees which gets harvested by the archive project. You fix the website, but how do you remove the exposed information from the archives?
Have you tried contacting them about such domains?
Who else would be sending takedown requests?
Content owners that already DCMA'd their content on some third parties website which the wayback machine has backed up.