Hacker News new | ask | show | jobs
by netsectoday 1347 days ago
If you expose a web server to the internet today you'll get 10 malicious requests for every 1 legitimate request.

This constant and unrelenting beating at your doors doesn't go away unless you add perimeter protection.

The options here are:

1) Block the IP and cidr ranges that are giving you trouble

2) Silently scan the connection request and block it when things look fishy

3) Provide a challenge in the return response that is difficult for bots to complete

Most of the bot protection on the internet is #2 where you don't notice you've been verified as a human and the site just loads. People hate #3 of completing a challenge, but the other option here is #1 where the site doesn't load at all.

I'd argue that bots are breaking the internet.

2 comments

Cloudflare seems to have a 4th:

4) Provide a challenge in the return response that is impossible for anyone to complete

One way to see this one is to use Selenium to launch your browser. E.g., run this code in Python:

from selenium import webdriver

browser = webdriver.Chrome()

then when the browser launches start using it manually to surf the web [1]. This works great on most sites I've visited this way, including my financial institutions. But if it hits a Cloudflare CAPTCHA it fails. For example try this on fanfiction.net. It hits the browser check page if I try to go to any category or story page. I click the checkbox to tell it I'm real, get the challenge to identify the lions or whatever, do that until it is satisfied I really can identify lions...and then just goes back to the browser check page. As far as I can tell it is just an endless loop of check the box and identify the things at that point.

There are some settings you can do in Selenium to tell it to to somewhat hide from the site that Selenium is involved, which for a while allowed getting past the CAPTCHA but that stopped working after a while.

There's also a project somewhere on Github to make a Selenium Chrome driver specifically designed to not trigger bot detection, which also worked for a while and then stopped.

[1] Why would I want a Selenium-launched browser if I'm going to be using it manually? It's for sites where I want to do some automated things on just some pages. For example one of my financial institutions has a lot of options on their transaction download page, so after I finish manually doing things like checking balances, looking at recent activity, paying bills and want to finish by downloading transactions, I can have the script that launched the browser handle that.

Try launching the instance of Chrome with `--disable-web-security` and `--disable-features=IsolateOrigins,site-per-process` options. I use these when launching Chrome via Playwright, and CAPTCHAs seemed to work fine several months ago.
When a selenium worker is attached to a pay-for-solution captcha service the infinite loop of captchas that can be solved but don't provide access would be meant to drain you financially. You uncovered a pretty sweet (dark) pattern implemented by Cloudflare to screw bot owners.

This is just #2 and #3 combined.

It sounds like this is working as intended and also wastes your time with un-passable captchas instead of you spending more time trying to figure out how to get around their bot protection.

Another observation here is that you really shouldn't be hacking some scripts on top of your bank login. The banks know this and they are trying everything possible to dissuade you from doing this.

> you really shouldn't

Huh, apparently ‘the war on general computation’, of which Cory Doctorow spoke, won't necessarily be led by Disney and such corporations, but also by people denying others the right to automate the workings of the GUI on one's machine.

(Coincidentally, this practice might also preclude the operation of aeleveny tools—again, as Doctorow noted, ‘there is no known general-purpose computer that can execute all the programs except the naughty ones’. It might be fun to see the faces of the ‘you shouldn't’ folks when they're asked why less-able clients can't use their websites.)

> you really shouldn't be hacking some scripts on top of your bank login

You can hack whatever you want, but from a SECURITY perspective this is horrible and the banks know this. There are secure ways to store credentials for scripts but most people will just hard-code the values or stick them in unencrypted ENV vars. Also, who's fault is it when the bank updates their website and the selenium script does something horribly wrong? Tell me more about Disney...

Service providers always want full control of the user experience and bots get in the way of that. We know this, but very often, that's not in the interests of the users at all.

Hence why there are legitimate reasons to write bots snd continue the arms race - otherwise, we'll pretty soon end up in a world where YouTube's business model of "subscribe to premium so you we'll stop interrupting the videos when you minimize the app" will be the standard mode of operation.

Your argument would be fine without the name-calling. I can see both sides of this.
i have never had a site hacked and i dont even know or care if its being attacked - just dont litter it with rce vulns. if its being ddosed on the other hand, then use an anti ddos solution but your post is such corpo bullshit that i cant even tell if its talking about defending against ddos or defending against hacks (which you cant defend against, they will get around your filters within 5 minutes of playing around).