| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by amirhirsch 2552 days ago

I work on bot detection, so I should be careful not to leak all of our approaches, email me at amir@imachines.com and we can have a more in depth offline conversation.

Since our captcha provides an opportunity for website monetization, we expect different uses aside from just bot detection, for example as a replacement for the "disable ad-blocker" popup or replacing paywalls with micropayments. This means there will be a broader set of users who are not strictly focused on attacking our dataset and polluting it with bad results. This allows us to have a confidence model initially based purely on the site.

Having a state-of-the-art AI is table stakes for a captcha product. We already run our datasets through visual recognition systems and run our captcha with an AI model-in-the-loop. In beta now, we offer websites under attack offline bot data in the background, currently as a batch report, and soon as a webhook. This approach has a game theoretic advantage of not leaking results to attackers, and allows us to run non-causal analysis of different attacks over a wide period of time. By combining this approach with a variety of rotating challenges we can identify patterns of behavior consistent with bots as they continue their attack strategy against only the mix of challenges they have seen.

There are also services where you can pay for people to solve captchas for you and this is a different sort of attack from bots, since they are in fact humans signing up for hundreds of accounts. If your goal was to prevent fraudulent signups, or to host a give-away for example, then we can have days of time to perform an extensive analysis offline, and perform an epidemic analysis of the traffic.