Hacker News new | ask | show | jobs
by festivusr 5073 days ago
There was an interesting talk that mentioned CAPTCHA by one of its creators, Luis von Ahn, at the AAAI-12 conference on AI and robotics this past week.

In ReCAPTCHA, the two-word CAPTCHA version, one of the two words is taken from a scanned book. That (unknown) word was one that failed OCR for that book.

The other word is one that captcha already knows the answer to.

The assumption is if you get the known captcha correct, then you probably got the other one correct as well (if it was possible to read it). The answer to the unknown word supplements the OCR of the book.

The captchas are put in random order, and you only have to get one of them right.

Luis's thought was that people are wasting all this time doing captcha - why not use that time to do something useful, like help digitize books.

As an aside, he's also one of the principal people behind duolingo, which is a quite awesome language learning / human-assisted translation engine.

2 comments

Yeah. The actual problem as I see it is that people have been trained that you have to get captcha's "right", where with these recaptchas all you really need is a reasonable guess because there is no 'right' (and nowhere does it say that).

The assumption behind recaptcha was a novel one, but it seems pretty obvious that the OCR is really just about as good as humans anyway - the 'difficult words' that usually get served are most commonly either non-existant words (printing/writing errors) or scanning/cropping errors.

Luis von Ahn is one of the very few tech entrepreneurs from Guatemala. He's killing it.