Hacker News new | ask | show | jobs
by halb 361 days ago
I guess the blame is on me here for providing only a very brief context on the topic, which makes it sound like this is just anti-scraping solutions.

This kind of fingerprinting solutions are widely used everywhere, and they don't have the goal of directly detecting or blocking bots, especially harmless scrapers. They just provide an additional datapoint which can be used to track patterns in website traffic, and eventually block fraud or automated attacks - that kind of bots.

1 comments

If it's making a legitimate request, it's not an automated attack. If it's exceeding its usage quota, that's a simple problem that doesn't require eBPF.
What kind of websites do you have in mind when I talk about fraud patterns? not everything is a static website, and I absolutely agree with you on that point: If your static website is struggling under the load of a scraper there is something deeply wrong with your architecture. We live in wonderful times, Nginx on my 2015 laptop can gracefully handle 10k Requests per second before I even activate ratelimiting.

Unfortunately there are bad people out there, and they know how to write code. Take a look at popular websites like TikTok, amazon, or facebook. They are inundated by fraud requests whose goal is to use their services in a way that is harmful to others, or straight up illegal. From spam to money laundering. On social medial, bots impersonate people in an attempt to influence public discourse and undermine democracies.

I run simple static sites from a (small) off-grid server at home. It has plenty of capacity for normal use, but cannot fully handle the huge traffic overshoots that bots and DoSes and poorly-written systems of household-name-multinationals inflict. I should not have to pay/scale to over-provision by an order of magnitude or more to stop the bullies and overbearing/idle from hurting genuine users. Luckily some relatively simple but carefully considered rules shut out much of the bad traffic while hurting almost no legitimate human visitor that I can find. Nuance and local circumstances are everything. But that took some engineering time on my part, that I also should not have had to spend. Particularly in fending off the nominally-nice multinationals.
This is an overly simplistic view that does not reflect reality in 2025.
The simple reality is that if you don't want to put something online, then don't put it online. If something should be behind locked doors, then put it behind locked doors. Don't do the dance of promising to have something online, then stop legitimate users when they request it. That's basically what a lot of "spam blockers" do -- they block a ton of legitimate use as well.