Hacker News new | ask | show | jobs
by Fnoord 957 days ago
Some possible clues:

> https://github.com/kubero-dev/ladder#environment-variables

> USER_AGENT User agent to emulate Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

> X_FORWARDED_FOR IP forwarder address 66.249.66.1

> RULESET URL to a ruleset file https://raw.githubusercontent.com/kubero-dev/ladder/main/rul... or /path/to/my/rules.yaml

2 comments

Oh wow... I'm surprised that's enough. When I was researching scraping protection bypass, you had to do some real crazy stuff with the browser instance + using residential IPs at a minimum...
Thats not the full story. It works on many sites, but some (ft.com as an example) have more severe countermeasures to bypass the paywall. Therefore the ladders modifies the served HTML from origin to remove such.

Those rules still need to be build up. (by me or the OS-community)

I don’t know of any off-the-shelf product that respects X_FORWARDED_FOR unless the current request ip originates from a whitelisted (or lan) address.