Hacker News new | ask | show | jobs
by ATechGuy 495 days ago
Looks like detecting real humans apart from agents is going to be an arms race if the detection is based on browser/device fingerprinting or visual/audio captchas; AI will only get better.

What are captcha alternatives that can block resource consumption by bots?

4 comments

Setting request quotas per natural human. However, that has some problems to solve:

1. Who gets to decide who is a different natural human? I'm working on uniquonym (https://lemmy.amxl.com/c/project_uniquonym) that will leverage governments to decide this; other solutions include https://proofofhumanity.id/ and Worldcoin.

2. How do you avoid this becoming a supercookie tracking solution that badly impacts privacy? Zero-knowledge proofs provide some help here - there are ways to create an ID that changes on a certain frequency and is different per site, but different IDs can't be correlated, preventing long term tracking and cross-site tracking, while still providing enough to rate-limit per natural person.

3. How do you stop people selling their identity to scrapers? This is a hard one to solve, but there are protocols that make it harder without giving up sensitive information or being interactively involved on an ongoing basis.

CAPTCHAs have been ineffective as a true "bot detection" technique for a while as tools like anti-captcha.com allow for outsourcing it to real humans. BUT they have been successful at the economic side of raising the cost of programmatic traffic on your site (which is good enough for some use cases)

As the author of this agent detection post, we agree that CAPTCHA and vanilla browser/device fingerprinting is quickly not going to be very valuable in isolation, but we still see a lot of value in advanced network/device/browser fingerprinting

The main reason is that the underlying corpus & specificity of browser/device/network data points you get from fingerprinting makes it much easier to build more robust systems on top of it than a binary CAPTCHA challenge. For us, we've found it very useful to still have all of the foundational fingerprinting data as a primitive because it let us build a comprehensive historical database of genuine browser signatures to train our ML models to detect subtle emulations, which can reliably distinguish between authentic browsers and agent-driven imitations

That works really well for the OpenAI/BrowserBase models. Where that gets tricky is the computer-use agents where it's actually putting its hands on your keyboard and driving your real browser. Still though, it's valuable to have the underlying fingerprinting data points because you can still create intelligent rate limits on particular device characteristics and increase the cost of an attack by forcing the actor to buy additional hardware to run it

I don't think tracking everything is the way to go; info would get outdated very soon and tracking compromises user privacy. A simple solution could be to throw a challenge that humans can easily solve, but agents absolutely cannot now or in the future (think non-audio/visual/text).
A credit card.
I can give a credit card to my local AI.
I want people like you unleashing AI on my site then
Web Environment Integrity. Eventually your hardware will rat you out via attestation.
And you think nobody (professional hackers?) can put together a "virtual TPM" that falsifies real hardware info? I think there are much simpler solutions, but the big tech wants to retain the control.
the whole point of TPM is that you cannot do it. And it's why windows 11 requires a modern TPM.

It's a travesty of modern computing. As an owner of hardware, i must be completely 100% able to control all aspect of it, and TPM is one aspect for which you are gated out.

Oh I can think of dystopian arrangements between Cloudflare, Google, Intel and AMD that'll fix that.