Hacker News new | ask | show | jobs
by mrkramer 819 days ago
>Well TIL that IA does not respect robots.txt.

At least, that's what they say[0].

>What would stop an actor from maliciously complying with a robots.txt file by just going to the internet archive instead.

Nothing; as far as I understand scraping public web is legal or that's what courts are saying lately. Btw, it's mind boggling to be me that after 30 years of commercial Internet and Web, we still don't have a definite answer is scraping of public websites and public web content legal or illegal.

[0] https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

1 comments

> Nothing; as far as I understand scraping public web is legal or that's what courts are saying lately. Btw, it's mind boggling to be me that after 30 years of commercial Internet and Web, we still don't have a definite answer is scraping of public websites and public web content legal or illegal.

I was more thinking from a public perception side instead of legal, but legal would be a good question too.

Something like, "Yeah I totally respected your robots.txt file the only reason I have your data is because I crawled IA, see they are the ones you should be mad at not us"