Hacker News new | ask | show | jobs
by robertkeizer 1044 days ago
This is why I work at archive.org. Is it perfect? No. Does it have value to society? Absolutely.
4 comments

I adore archive.org. I'm worried though it is becoming somewhat of a load bearing element of civilization, given the importance of shared and accurate history. We need redundancy.

~~I'm also worried about the deletion of old pages on archive because new owners of a domain update the robots.txt file to disallow it, which I've heard wipes the entire archive.org history of that domain. I hope that gets addressed.~~

Edit: this is no longer the case

Yeah wait, what? I hope that's incorrect. Updating robots.txt to disallow should only omit content from that point onward... It shouldn't be retroactive. What if there's a new owner of a respective domain, for example?
It was correct. And archive.org used this self inflicted problem to justify disregarding robots.txt.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

Oh thank you for the update, that's great news! Agree with their position 100%.
> Agree with their position 100%.

Including the part where they blamed anyone but themselves for hiding old snapshots when robots.txt changed?

no I didn't see that in there, I just agree that robots.txt makes sense for web crawlers but not archival purposes
It isn't, I really wish that instead of wiping DECADES of history; it only applies to content that was archived from the day of the domain's registration. I think this is slightly more reasonable, but I imagine they simply don't have access to such data.
Simply requiring domain owners contact archive.org to remove old snapshots would be better than applying robots.txt retroactively.
The Internet Archive is not really known for deleting anything. In many different postings across the years, their founder and employees have mentioned items being taken "out" of the wayback machine, not "deleted" out of it. I don't think you have anything to worry about.
It's absolutely essential and irreplaceable for the web archive, and that's why I was pretty angry that you guys decided to pick a fight with the big publishers over "loaning" ebooks that could have gotten the whole site killed.

It was possibly a worthwhile fight for someone to have, but not for the site that hosts the Wayback Machine. Separation of concerns, my friends...

what is it like to work there? can you describe what archive does on a day to day basis? what do you need most apart from I guess donations?
I'd read a blog post about that!
You do great work.