Hacker News new | ask | show | jobs
by thrdbndndn 1391 days ago
That doesn't seem to work well.

For example, take this NYT article https://www.nytimes.com/2022/08/31/health/life-expectancy-co...

Google cache:

https://webcache.googleusercontent.com/search?q=cache:https:... -- 404

OP's service

https://cfworker-beatthatwall.jayass.workers.dev/?url=https:... -- works.

However, one can adopt your bookmarklet to use OP's service when needed instead of installing extension/userscript that seem to match all the sites.

1 comments

It doesn't work because there is no Google cache entry for that page. They've instructed Google not to cache it.

> <meta data-rh="true" name="robots" content="noarchive, max-image-preview:large"/>

So, OP must be using some other means to retrieve the page.

In the case of NY Times, they're likely just grabbing the non-archived version and performing an operation similar to 12ft ladder.

Google cache fetching seems like it might be an effective strategy for a site like Washington Post that have extremely effective paywall enforcement (till your turn off JS), but also allow Google cache.