Hacker News new | ask | show | jobs
by RVuRnvbM2e 450 days ago
Were the walls you hit caused by Fastly's bot detection? I've found it to be quite accurate.

On the other hand CloudFlare and Akamai mistakenly block me all the damn time.

1 comments

It's not like they say, but it's at least three different implementations and I don't think any were cloudflare because I've been running into those pages for years and they've got captchas (functional or not). One of them was Akamai I think indeed
Yeah, I definitely don't want to pivot this thread into a product pitch, as the important thing is helping the open-source projects, but we can work with the maintainers to tune the systems to be as strict/lax as preferred. I'm sure the other services can too, to be fair.
The underlying issue is that many sites aren't going to get feedback from the real people they've blocked, so their operators won't actually know that tuning is required (also, the more strict the system, the higher percentage of requests will be marked as bots, which might lead an operator to want things to be even more strict...)
I will say -- a higher-end bot detection service should provide paper trails on the block actions they take (this may not be available for freemium tiers, depending on the vendor).

But to your point, the real kicker is the "many sites aren't going to get feedback from the real people they've blocked" since those tools inherently decided that the traffic was not human. You start getting into Westworld "doesn't look like anything to me" territory.

I'm not into westworld so can't speak to the latter paragraph, but as for "high-end" vendors' paper trail: how do log files help uncover false blocks? Any vendor will be able to look up these request IDs printed on the blocking page, but how does it help?

You don't know if each entry in the log is a real customer until they buy products proportional to some fraction of their page load rate, or real people until they submit useful content or whatever your site is about. Many people just read information without contributing to the site itself and that's okay, too. A list of blocked systems won't help; I run a server myself, I see the legit-looking user agent strings doing hundreds of thousands of requests, crawling past every page in sequence, but if there wasn't this inhuman request pattern and I just saw this user agent and IP address and other metadata among a list of blocked access attempts, I'd have no clue if the ban is legit or not

With these protection services, you can't know how much frustration is hiding in that paper trail, so I'm not blocking anyone from my sites; I'm making the system stand up to crawling. You have to do that regardless for search engines and traffic spikes like from HN