|
|
|
|
|
by nerdjon
820 days ago
|
|
Well TIL that IA does not respect robots.txt. Does IA themselves block crawlers? It doesn't look like it according to their robots.txt, even going so far as to say "Please crawl our files." What would stop an actor from maliciously complying with a robots.txt file by just going to the internet archive instead. |
|
At least, that's what they say[0].
>What would stop an actor from maliciously complying with a robots.txt file by just going to the internet archive instead.
Nothing; as far as I understand scraping public web is legal or that's what courts are saying lately. Btw, it's mind boggling to be me that after 30 years of commercial Internet and Web, we still don't have a definite answer is scraping of public websites and public web content legal or illegal.
[0] https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...