Hacker News new | ask | show | jobs
by lxgr 297 days ago
> How will Internet Archive operate?

Presumably increasingly less and less effectively, at least if they continue honoring robots.txt and don't implement scraping protection bypass mechanisms.

https://www.theverge.com/news/757538/reddit-internet-archive...

2 comments

IA has not honored robots.txt for the better part of a decade now.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

Are you sure? The article (from 2017) you've linked only mentions "U.S. government and military web sites", and their wayback machine FAQ still mentions that robots.txt "might" prevent crawling:

https://help.archive.org/help/using-the-wayback-machine/

Interestingly, the article declares that Cloudflare is uncertain if the Internet Archive respects robots.txt