| https://web.archive.org https://commoncrawl.org I would prefer more of these. Alas, archive.today (archive.ph, archive.is, archive.vn, etc.) is sometimes blocked in some countries, it sometimes serves CAPTCHAs, it tries to create a "fingerprint" using Javascript, and it contains a tracking pixel. Neither Internet Archive nor Common Crawl do those things. (There are other archives I am not mentioning that do not do these things either.) When it works, archive.today may seem like a perfect solution to "paywalls". And then it stops working. In truth most paywalls are solved by controlling HTTP headers like UA and X-forwarded-for, controlling Javascript and controlling cookies. This control requires no third party intermediary (middleman) like Archive.today. Or Internet Archive, for that matter. None of these archives are perfect and it's true the public could use more of them. But there are better ways to avoid "paywalls" which are just a means of collecting data about non-subscribers while deliberately annoying them with Javascript. |