Hacker News new | ask | show | jobs
by 300bps 4732 days ago
I agree with your general point but your example is not correct

The Internet Archive makes it very easy to remove content from their site. I had a family web site accessible to the public for 12 years with about 25,000 pictures on it. For various reasons I took the site down but archive.org still showed the pictures. A quick change to robots.txt stopped that though. Basically, even if the archive grabbed the files at a point in time they periodically check to make sure they still have the right to those files in robots.txt. If they don't, they won't display them. I was vey impressed with them when I learned this.

2 comments

It's also IMO a reasonable question what the "right" behavior should be in such a situation. I'm tempted to make the argument that once content has been made available to the world and archived by the Internet Archive, my descendants or a new corporate owner shouldn't necessarily have the right to remove that content from public view at some time in the future.
I'm in this camp. I owned a 4 letter TLD that I was first registrant on in 1994 and held it until I sold it in 2001. I had lots of interesting things published on the site and as soon as the new owner took the domain, he put up a robot.txt blocking the site my years of content disappeared from the archive. :(

I keep toying with the idea of trying to buy the domain back but it's value has become somewhat prohibitive. Maybe when I win the lottery :/

Make sure you maintain control over the domain and always have a robots.txt file present, because if you ever lose that, those files will become visible again. Good luck arranging for your descendents to do this after you die.
"If you cannot place the robots.txt file, opt not to, or have further questions, email us at info at archive dot org."

http://archive.org/about/faqs.php#2

just mail them, they are nice people.
it would be nice to have some standard method to delete the content.
Emailing them is the standard method.

Sometimes, you know, you've got to, like, talk to people.

I did this with my old website and it worked great.
I'm pretty certain that this isn't the case. If anything, it's the other way around - if a domain changes hands, and the new owner sets up a restrictive robots.txt, the content in the archive is made unavailable, and stays unavailable even if the robots.txt (or the whole domain) later disappears.
This has been my experience as well. Once the content is deleted, it's gone.
This is not true.