Hacker News new | ask | show | jobs
by bhouston 442 days ago
So the website claims:

"Avoids bot detection and CAPTCHAs by using your real browser fingerprint."

Yeah, not really.

I've used a similar system a few weeks back (one I wrote myself), having AI control my browser using my logged in session, and I started to get Captcha's during my human sessions in the browser and eventually I got blocked from a bunch of websites. Now that I've stopped using my browser session in that way, the blocks eventually went away, but be warned, you'll lose access yourself to websites doing this, it isn't a silver bullet.

5 comments

The caveat with these things is usually "when used with high quality proxies".

Also I assume this extension is pretty obvious so it wont take long for CF bot detection to see it the same as playwrite or whatever else.

The extension enable debugging in your browser (a banner appears telling you about automation). It's possible to detect that in JavaScript.

Hence why projects like this exist: https://github.com/Kaliiiiiiiiii-Vinyzu/patchright. They hide the debugging part from JavaScript.

It might depend on the speed with which you click on the elements on the website.
it does, CF bans my own honest to God clicks if I do them too fast.
About five years ago, maybe more, Google started sending me captchas if I ran too many repetitive searches. I could be wrong, but it feel like most large platforms have fairly sophisticated anti-bot/scraping stuff in place.
Google does the same to me: Don't they know, I keep modifying my searches because their results sucked so bad I had to try 30 times to find the piece of information I needed?
GitHub regularly blocks me for some reason. They tell me to slow down and I’m blocked for hours. I don’t get it.
Remember when github disabled searches for users who aren‘t logged in? Well, they just set the threshold for searches to 0 these days so they have de-facto disabled them again, this time avoiding the shitstorm.
Make sure you are logged in. It was blocking me after just a couple searches if not logged in.
Yandex does the same.
I use Vimium (Chrome extension for using keyboard control of the browser) and this happens to me as well since the behavior looks "unnatural".
Must suck for people with assistive software. I get blocked on CF for now damn reason.
Yeah, I do wonder if there are any ADA implications with that?
I really really hope there are. Not just because of people who need these provisions, but also for everyone else, as accessibility is the last line of defense for preserving end-user interoperability.

Screen readers need to see a de-bullshittified, machine-readable version of the site + this is required by law sometimes, and generally considered a nice thing to enable -> the site becomes not just screen-reader friendly, but end user automation-friendly in general.

(I don't know how long this will hold, though. LLMs are already capable of becoming a screen reader without any special provisions - they can make sense of the UI the same way a sighted person can. I wouldn't trust them much now, but they'll only get better.)

I wish people would stop using CF. It’s just making the internet worse.
How so?
Same here. And I am also using vimium.
SSLy the speed clicker
What do you think they might be looking for that could be detected pretty quickly? I'm wondering if it is something like they can track mouse movement and calculate when a mouse is moving too cleanly, so adding some more human like noise to the mouse movement can better bypass the system. Others have mentioned doing too many actions too fast, but what about potential timing between actions. Even if every click isn't that fast, if they have a very consistent delay that would be another non-human sign.
Modern captchas use a number of tools including many of the approaches you mentioned. This why you might sometimes see a CloudFlare "I am not a robot" checkbox that checks itself and moves along before you have much time to even react. It's looking at a number of signals to determine that you're probably human before you've even checked the box.
When I am using keyboard navigation, shortcuts and autofills, I seem to get mistaken for a bot a lot. These Captchas are really bad at detecting bots and really good at falsely labelling humans as bots.
With AI feeding / scraping traffic to sites growing ridiculously fast, I think captchas & their equivalent are only going to be on the rise, and given the rise in so many people selling residential proxies I see, I don't doubt that measures and counter-measures on both sides are getting more and more sophisticated.

> These Captchas are really bad at detecting bots and really good at falsely labelling humans as bots.

As a human it feels that way to you. I suspect their false-positive rate is very low.

Of course, you may well be right that you get pinged more because of your style of browsing, which sux.

Given the volume of bots they tend to be remarkably good at detecting bots

source: I work in a team that uses this kind of bot detection and yes, it works. And yes we do our best to keep false positives down

They're detecting patterns predominantly bots use. The fact that some humans also use them doesn't change that.

Back when I was playing Call of Duty 4, I got routinely accused of cheating because some people didn't think it was possible to click the mouse button as fast as I did.

To them it looked like I had some auto-trigger bot or Xbox controller.

I did in fact just have a good mouse and a quick finger.

What's different is the badness of the outcome: if children mislabel you as a cheater in CoD, you may get kicked from the server.

If CloudFlare mislabels you as a bot, however, you may be unable to access medical services, or your bank account, or unable to check in for a flight, stuff like that. Actual important things.

So yes, I think it's not unreasonable to expect more from CF. The fact that some humans are routinely mischaracterized as bots should be a blocker level issue.

Does it suck? Yes, absolutely. Should CF continuously work to reduce false positives? Yes, absolutely.

I've never failed the CF bot test so don't know how that feels. Though I have managed to get to level 8 or 9 on Google's ReCaptcha in recent times, and actually given up a couple of times.

Though my point was just it's gonna boil down to a duck test, so if you walk like a duck and quack like a duck, CF might just think you're a duck.

Well you have to have false positives or negatives. Maybe they prefer positives
> I'm wondering if it is something like they can track mouse movement

Yes, this is a big signal they use.

> adding some more human like noise to the mouse

Yes, this is a standard avoidance strategy. Easier said than done. For every new noise generation method, they work on detection. They also detect more global usage patterns and other signals, so you'd need to immitate the entire workflow of being human. At least within the noise of their current models.

Have a lot of small things count towards the result. Users behave quite linearly, extra points if they act differently all of a sudden.
There's also the whole issue of captchas being in place because people cannot be trusted to behave appropriately with automation tools.

"Avoids bot detection and CAPTCHAs" - Sure asshole, but understand that's only in place because of people like you. If you truly need access to something, ask for an API, may you need to pay for it, maybe you don't. May you get it, maybe the site owner tells you to go pound sand and you should take that as you're behaviour and/or use case is not wanted.

Actually, the CAPTCHAs are in place mostly because of assholes like you abusing other assholes like you[0].

Most of the automated misbehavior is businesses doing it to other businesses - in many cases, it's direct competition, or a third party the competition outsources it to. Hell, your business is probably doing it to them too (ask the marketing agency you're outsourcing to).

> If you truly need access to something, ask for an API, may you need to pay for it, maybe you don't.

Like you'd give it to me when you know I want it to skip your ads, or plug it to some automation or a streamlined UI, so I don't have to waste minutes of my life navigating your bloated, dog-slow SPA? But no, can't have users be invisible in analytics and operate outside your carefully designed sales funnel.

> May you get it, maybe the site owner tells you to go pound sand and you should take that as you're behaviour and/or use case is not wanted.

Like they have a final say in this.

This is an evergreen discussion, and well-trodden ground. There is a reason the browser is also called "user agent"; there is a well-established separation between user's and server's zone of controls, so as a site owner, stop poking your nose where it doesn't belong.

--

[0] - Not "you" 'mrweasel personally, but "you" the imaginary speaker of your second paragraph.

It seems that we have very different types of businesses in mind. I really didn't consider tracking users and displaying ads, but I also don't think this is where these types of tools would be used. Well, they might, but that's as part of some content farm, undesirable bots and downright scams, so nothing of value is really lost if this didn't exist.

If you have a sales funnel, as in you take orders and ship something to a customer, consumer or business, I almost guarantee you that you can request an API, if the company you want to purchase from is large enough. They'll probably give you the API access for free, or as part of a signup fee and give you access to discounts. Sometimes that API might be an email, or a monthly Excel dump, but it's an API.

When we're talking site that purely survive on tracking users and reselling their data, then yes, they aren't going to give you API access. Some sites, like Reddit does offer it I think, but the price is going to be insane, reflecting their unwillingness to interact with users in this way.

> Not "you" 'mrweasel personally

Understood, but thank you :-)

> It seems that we have very different types of businesses in mind. I really didn't consider tracking users and displaying ads, but I also don't think this is where these types of tools would be used.

I wasn't thinking primarily about tracking and ads here either, when it comes to B2B automation. What I meant was e.g. shops automatically scrapping competing stores on a continued basis, to adjust their own prices - a modern version of the old "send your employees incognito to the nearby stores and have them secretly note down prices". Then you also have comparison-shopping (pricing aggregators) sites that are after the same data, too.

And then of course there's automated reviews (reading and writing), trying to improve your standing and/or sabotage competition. There's all kinds of more or less legit business intelligence happening, etc. Then there's wholesale copying of sites (or just their data) for SEO content farms, and... I could go on.

Point being, it's not the people who want to streamline their own work, make access more convenient for themselves, etc. that are the badly-behaving actors and reasons for anti-bot defenses.

> If you have a sales funnel, as in you take orders and ship something to a customer, consumer or business, I almost guarantee you that you can request an API, if the company you want to purchase from is large enough. They'll probably give you the API access for free, or as part of a signup fee and give you access to discounts. Sometimes that API might be an email, or a monthly Excel dump, but it's an API.

The problem from a POV of a regular users like me is, I'm not in this for business directly; the services I use are either too small to bother providing me special APIs, or I am too small for them to care. All I need is to streamline my access patterns to services I already use, perhaps consolidate it with other services (that's what MCP is doing, with LLM being the glue), but otherwise not doing anything disruptive to their operations. And I'm denied that, because... Bots Bad, AI Bad, Also Pay Us For Privilege?

> When we're talking site that purely survive on tracking users and reselling their data, then yes, they aren't going to give you API access. Some sites, like Reddit does offer it I think, but the price is going to be insane, reflecting their unwillingness to interact with users in this way.

Reddit is an interesting case because the changes to their API and 3rd-party client policies happened recently, and clearly in response to the rise of LLMs. A lot of companies suddenly realized the vast troves of user-generated content they host are valuable beyond just building marketing profiles, and now they try to lock it all up in order to extort rent for it.