Hacker News new | ask | show | jobs
by dil8 4224 days ago
Stuff like this is one of the many reasons I love archive.org. I think i's really important to capture historical artifacts for future analysis.

The service they provide doesn't allow the "Ministry of Truth"[1] to doctor historical documents to meet their present day narrative.

[1] https://en.wikipedia.org/wiki/Ministry_of_Truth

2 comments

Sadly, it does.

archive.org respect the robots.txt of the current website owner. This can mean that they have the data but choose not to give you access to them. I have seen cases in the past where a website I once frequented became defunct, then the domain expired, then someone parked a holding page on that domain including a robots.txt that keeps archive.org from displaying the old data (which do not even belong to the current owner of the domain!).

If they wanted to, there are a number of ways Uber could prevent archive.org from displaying that blog post. Many of these ways are due to the good faith under which archive.org operates (nobody is forcing them to respect robots.txt), and some even involve resorting to legal methods. But history is always mutable.

(Nothing but love on my end for archive.org, believe me! But I do want to point out the lengths that some people will go to alter the historical record).

The Internet Archive should implement some sort of digital signature system to allow website owners with foresight to prevent this.
They could just timestamp different versions of robots.txt (which they probably do already), and respect it depending on date (which is more of a hassle, because you have to build it in your UI logic).
That would not solve the problem they're trying to solve.

Let's say I post something that I shouldn't have posted -- insider stock information, nude photos, whatever. Perhaps something illegal for me to post. I need to make it go away.

I need to be able to create a robots.txt today which affects stuff I posted yesterday.

This is why archive.org respects the current robots.txt for access to past content.

Thanks for the info, I didn't know that.
This is a good example of using the power of technology to enable the weak against the powerful.

And yes I don't care what you think, but a company with a billion(ish) of funding is more powerful than YOU.