| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by seanhly 119 days ago

There are some pretty robust browser addons for bypassing article paywalls, notably https://gitflic.ru/project/magnolia1234/bypass-paywalls-fire...

This particular addon is blocked on most western git servers, but can still be installed from Russian git servers. It includes custom paywall-bypassing code for pretty much every news websites you could reasonably imagine, or at least those sites that use conditional paywalls (paywalls for humans, no paywalls for big search engines). It won't work on sites like Substack that use proper authenticated content pages, but these sorts of pages don't get picked up by archive.today either.

My guess would be that archive.today loads such an addon with its headless browser and thus bypasses paywalls that way. Even if publishers find a way to detect headless browsers, crawlers can also be written to operate with traditional web browsers where lots of anti-paywall addons can be installed.

3 comments

wuschel 119 days ago

Wow, did not know about the regional blocking of git servers! Makes me wonder what else is kept from the western audience, and for what reason this blocking is happening.

Thanks for sketching out their approach and for the URI.

link

pingou 119 days ago

But don't news websites check for ip addresses to make sure they really are from Google bots?

link

seanhly 119 days ago

Most of them don’t check the IP, it would seem. Google acquires new IPs all the time, plus there are a lot of other search systems that news publishers don’t want to accidentally miss out on. It’s mostly just client side JS hiding the content after a time delay or other techniques like that. I think the proportion of the population using these addons is so low, it would cost more in lost SEO for news publishers to restrict crawling to a subset of IPs.

link

expedition32 119 days ago

I use this add on. It does get blocked sometimes but they update the rules every couple of weeks.

link