| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amatecha 1044 days ago
	Yeah wait, what? I hope that's incorrect. Updating robots.txt to disallow should only omit content from that point onward... It shouldn't be retroactive. What if there's a new owner of a respective domain, for example?

2 comments

pseudalopex 1044 days ago

It was correct. And archive.org used this self inflicted problem to justify disregarding robots.txt.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

link

npunt 1044 days ago

Oh thank you for the update, that's great news! Agree with their position 100%.

link

pseudalopex 1044 days ago

> Agree with their position 100%.

Including the part where they blamed anyone but themselves for hiding old snapshots when robots.txt changed?

link

npunt 1044 days ago

no I didn't see that in there, I just agree that robots.txt makes sense for web crawlers but not archival purposes

link

rany_ 1044 days ago

It isn't, I really wish that instead of wiping DECADES of history; it only applies to content that was archived from the day of the domain's registration. I think this is slightly more reasonable, but I imagine they simply don't have access to such data.

link

pseudalopex 1044 days ago

Simply requiring domain owners contact archive.org to remove old snapshots would be better than applying robots.txt retroactively.

link