|
|
|
|
|
by johanj
2973 days ago
|
|
robots.txt is really only supposed to be used for blocking the Internet Archives first snapshot, and not to remove existing snapshots – and even this might not be the case in the future as they try to preserve most snapshots. They made a few policy changes last year[1] to how they handle robots.txt files, to handle cases where a domain is sold and a new robots.txt file would result in deleting old data among other things. [1]: https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea... |
|
As happened in this case: https://news.ycombinator.com/item?id=16919017
No? The article you linked says they've stopped paying attention to robots.txt for US government and military sites, but it looks like it still retroactively removes visibility for everything else.
I guess IA could change their practices. If medium or people like them start actively using robots.txt to try to retroactively remove things from visibility in the archive, perhaps IA will change their practices/policy. I would welcome it.