Hacker News new | ask | show | jobs
by amatecha 1044 days ago
Yeah wait, what? I hope that's incorrect. Updating robots.txt to disallow should only omit content from that point onward... It shouldn't be retroactive. What if there's a new owner of a respective domain, for example?
2 comments

It was correct. And archive.org used this self inflicted problem to justify disregarding robots.txt.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

Oh thank you for the update, that's great news! Agree with their position 100%.
> Agree with their position 100%.

Including the part where they blamed anyone but themselves for hiding old snapshots when robots.txt changed?

no I didn't see that in there, I just agree that robots.txt makes sense for web crawlers but not archival purposes
It isn't, I really wish that instead of wiping DECADES of history; it only applies to content that was archived from the day of the domain's registration. I think this is slightly more reasonable, but I imagine they simply don't have access to such data.
Simply requiring domain owners contact archive.org to remove old snapshots would be better than applying robots.txt retroactively.