Hacker News new | ask | show | jobs
by TekMol 1391 days ago
From the description, it sounds like this extension just puts "https://webcache.googleusercontent.com/search?q=cache:" in front of the current url?

If so, you can also do this via a simple bookmarklet:

    javascript:location.href='https://webcache.googleusercontent.com/search?q=cache:'+location.href;{}
If you don't know what bookmarklets are: Edit any old bookmark and put the above line into the url field. Next time you click it, it will bring you to the Google cache of the current page you are on.
3 comments

That doesn't seem to work well.

For example, take this NYT article https://www.nytimes.com/2022/08/31/health/life-expectancy-co...

Google cache:

https://webcache.googleusercontent.com/search?q=cache:https:... -- 404

OP's service

https://cfworker-beatthatwall.jayass.workers.dev/?url=https:... -- works.

However, one can adopt your bookmarklet to use OP's service when needed instead of installing extension/userscript that seem to match all the sites.

It doesn't work because there is no Google cache entry for that page. They've instructed Google not to cache it.

> <meta data-rh="true" name="robots" content="noarchive, max-image-preview:large"/>

So, OP must be using some other means to retrieve the page.

In the case of NY Times, they're likely just grabbing the non-archived version and performing an operation similar to 12ft ladder.

Google cache fetching seems like it might be an effective strategy for a site like Washington Post that have extremely effective paywall enforcement (till your turn off JS), but also allow Google cache.

I do similar but use javascript:window.location.href="https://archive.is/newest/"+location.href for pulling from archive.is. That works for many pages.
That's not the only thing it does. But it's the starting point. After that ir removes some scripts so the paywall don't show again after you load the page