Hacker News new | ask | show | jobs
by joelvh 5694 days ago
If you were able to analyze the sentence structure of all 180 million questions, how many different sentence structures would there be? This all points to the fact that you can build algorithms to guess the answers eventually.
1 comments

Not even just guess them but accurately determine them.

A few years back I was hired by a third party to build a system to break the CAPTCHA on a popular site for various evil deeds. Morals set aside, the money was good and I had a wedding to pay for. A CAPTCHA system becomes quite breakable when it becomes predictable. The system in question used an image based CAPTCHA that used the same (albeit annoying) font for each image, as well as a static distortion overlay and a second set of random distortion. By extracting a thousand sample images I was able to build a system in Perl that could determine the text with an estimated 98% success rate - and when it failed you would just request a new CAPTCHA.

My solution would be to mix up images with logic. I.E.

In the following list of images, which image number contains the green animal: {pic of zebra}, {pic of frog}, {pic of giraffe}

This would require image recognition as well as logic.

Interestingly enough, WolframAlpha can generate a CAPTCHA image of each of these text questions, as to make it harder for a bot to decode AND answer the question! Check it out: http://www.wolframalpha.com/input/?i=CAPTCHA+What+is+seven+h...
It can't work the way you explained it: I just solved your CAPTCHA with 33% success rate (waaay too high for a useful CAPTCHA). Perhaps if you ask for "the three pictures of X animal out of those 9", and you had a database with which animal has which property (and you also ran some fuzz over the images so no two images would ever be the same). I'm still skeptical...
That would assume that it is multiple choice - however if it's still free form text input, requiring the input to equal "frog" would solve that issue. Text + images + logic would offer a lot more hurdles than just any single one of those.