Hacker News new | ask | show | jobs
by bufordsharkley 4368 days ago
The thing that really frustrates me about the Internet Archive's treatment of robots.txt: if a domain expires and the domain provider changes the robots.txt to something restrictive, the Wayback Machine will completely clear the history of the site. Even though it's very clearly not the same agent at play-- this is not the creator of the site's content. I've seen it happen, and it breaks my heart every time.
2 comments

Why wouldn't it consider the archived state of robots.txt?
One of the reasons I like archive.today. Obviously, they lack the depth of history, but they don't censor so easily.