|
|
|
|
|
by judge2020
2552 days ago
|
|
I guess the big question is accuracy - If you have a brand new dataset, couldn't bots assess the first few thousand images randomly and get through (since there is little or no basis for what is an accurate selection)? And if they do, how would that affect future real human selections (assuming it learns over time what selections are accurate)? Another concern is that it's very likely that Google's existing Cloud vision ML could handle most classification challenges your clients are trying to train (since you're basically working against a much wider-deployed mechanical turk dataset, recaptcha). High-profile websites (such as ecommerce sites) may have attackers (such as those with stolen CC's) willing to spend the money needed to run all of your images through Cloud Vision. So I guess my question is: are other data points collected to prevent bots from getting through? I would understand if you can't answer some of these as they may fall under "trade secret" territory. |
|
Since our captcha provides an opportunity for website monetization, we expect different uses aside from just bot detection, for example as a replacement for the "disable ad-blocker" popup or replacing paywalls with micropayments. This means there will be a broader set of users who are not strictly focused on attacking our dataset and polluting it with bad results. This allows us to have a confidence model initially based purely on the site.
Having a state-of-the-art AI is table stakes for a captcha product. We already run our datasets through visual recognition systems and run our captcha with an AI model-in-the-loop. In beta now, we offer websites under attack offline bot data in the background, currently as a batch report, and soon as a webhook. This approach has a game theoretic advantage of not leaking results to attackers, and allows us to run non-causal analysis of different attacks over a wide period of time. By combining this approach with a variety of rotating challenges we can identify patterns of behavior consistent with bots as they continue their attack strategy against only the mix of challenges they have seen.
There are also services where you can pay for people to solve captchas for you and this is a different sort of attack from bots, since they are in fact humans signing up for hundreds of accounts. If your goal was to prevent fraudulent signups, or to host a give-away for example, then we can have days of time to perform an extensive analysis offline, and perform an epidemic analysis of the traffic.