People always use that link as reference to say that Internet Archive ignores robots.txt but it only actually says they are ignoring it for government sites. It suggests that they might do it for other sites in the future (of 2017), but does not actually say that that they have done it.
That first link is confusing; it seems to say they ended up removing the pages not because of a legal threat but because of robots.txt “automated”.
If archive.org can be manipulated to remove content either via legal threats or simple robots.txt it loses a significant portion of its societal value.
If archive.org can be manipulated to remove content either via legal threats or simple robots.txt it loses a significant portion of its societal value.