Hacker News new | ask | show | jobs
by freedomben 1347 days ago
Make sure your Cloudflare settings are as aggressive as possible. You might need to upgrade to the first paid level (I think "pro"?) to activate the most aggressive, but it does work very well.

After that, you can throw a CAPTCHA on pages (particularly submission pages), but that will harm legitimate users as well as bots.

Make sure your origin server is only reachable from Cloudflare. If people can hit it directly, then they bypass Cloudflare. If you use firewalld, I wrote this in my setup script that you can use:

    for range in $(curl -s -X GET "https://api.cloudflare.com/client/v4/ips" | jq -r '.result.ipv4_cidrs[]'); do
      for port in 80 443; do
        echo "Inserting firewalld rule for address range '${range}' on port '${port}'"
        firewall-cmd --zone=public --permanent \
          --add-rich-rule="rule family=\"ipv4\" source address=\"${range}\" port protocol=\"tcp\" port=\"${port}\" accept"
      done
    done

    firewall-cmd --remove-service=http --permanent
    firewall-cmd --remove-service=https --permanent
    firewall-cmd --reload
2 comments

> If you use firewalld, I wrote this in my setup script that you can use:

Aren't you supposed to use argo or certificate authentication for this?

Not supposed to. These are all valid options.
In my case I am concerned about false positives since visitor experience is a higher priority than blocking all bots. Cloudflare, in my experience, do generate too many false positives when it's too aggressive. A very nice idea though in other cases.
define what visitor experience means to you; lots of folks think captcha is acceptable, or loading in an operating-system-amount-of-code-for-javascript like reddit does, as acceptable. (if we define OS's as bloat-ware like MS has nowadays (or any carrier specific phone LOL); one could say: reddit is a javascript-OS minus a good website for those emacs/vi fans)

You mentioned that allow-paths is not quite an option as the main page gets hit by the bots; how are you detecting this - maybe some automation here is all that is needed? Note that lots of folks are using ad-blockers of varying sorts which some analytics sites claim as 'bots' or 'grey visitors' which make even landing on some home-pages a very sad experience when a full blown captcha shows up ( then I for sure stay away )