Hacker News new | ask | show | jobs
by egsec 4210 days ago
How many photos are in the universe of possible photos? How long would it take for outsourcing the process to tag all photos so a script could then do the matching?

Is the whole point of this to encourage hackers to get working on this AI challenge of identifying similar photos?

Either they need to hire a lot of people to sit around making these sets or they have an automated way of creating these sets which can be reversed. It would seem to be an arms race where google is paying people, but attackers can have people break it at a cost less than creating them (takes less time to match them up then to find good photos, clean them up, tag them, etc.).

An attacker would also just target the database where this is all stored. With the text recaptcha, it would seem that they have all of these photos and scanned books and you have 8+ character strings of [a-zA-Z0-9], random guessing would not be good enough, so the attacker needed to solve the OCR problem.

However, given the option to select x of 9 images, if you assume that the extremes are less likely of 1/9, 2/9, 8/9, 9/9- then I can hope to get lucky picking 4 or 5 each time, the order does not matter. If you distribute the attack to get around rate limits, etc. - perhaps just picking the first through fifth images gives you a sufficiently high success rate.

2 comments

I think a good chunk of the images are captured by way of Google's Streetview vehicles [1]. I'm seeing blurry images of house and apartment numbers all the time. So I'd imagine there are always new images popping up that Google can feed into the recaptcha system that haven't been seen before.

[1] http://www.google.com/recaptcha/intro/#creation-of-value

Correct, I am referencing the new nocaptcha system. Those images would get stale as opposed to those in the traditional scanned book, street signs, house numbers in the recaptcha.
http://xkcd.com/1425/

Probably sums it up best.