| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Fnoord 957 days ago

Some possible clues:

> https://github.com/kubero-dev/ladder#environment-variables

> USER_AGENT User agent to emulate Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

> X_FORWARDED_FOR IP forwarder address 66.249.66.1

> RULESET URL to a ruleset file https://raw.githubusercontent.com/kubero-dev/ladder/main/rul... or /path/to/my/rules.yaml

2 comments

janejeon 957 days ago

Oh wow... I'm surprised that's enough. When I was researching scraping protection bypass, you had to do some real crazy stuff with the browser instance + using residential IPs at a minimum...

link

2cpu1container 957 days ago

Thats not the full story. It works on many sites, but some (ft.com as an example) have more severe countermeasures to bypass the paywall. Therefore the ladders modifies the served HTML from origin to remove such.

Those rules still need to be build up. (by me or the OS-community)

link

ComputerGuru 957 days ago

I don’t know of any off-the-shelf product that respects X_FORWARDED_FOR unless the current request ip originates from a whitelisted (or lan) address.

link