| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lxgr 297 days ago

> How will Internet Archive operate?

Presumably increasingly less and less effectively, at least if they continue honoring robots.txt and don't implement scraping protection bypass mechanisms.

https://www.theverge.com/news/757538/reddit-internet-archive...

2 comments

walski 297 days ago

IA has not honored robots.txt for the better part of a decade now.

https://blog.archive.org/2017/04/17/robots-txt-meant-for-sea...

link

lxgr 296 days ago

Are you sure? The article (from 2017) you've linked only mentions "U.S. government and military web sites", and their wayback machine FAQ still mentions that robots.txt "might" prevent crawling:

https://help.archive.org/help/using-the-wayback-machine/

link

overfeed 297 days ago

Interestingly, the article declares that Cloudflare is uncertain if the Internet Archive respects robots.txt

link