Hacker News new | ask | show | jobs
by bparsons 4620 days ago
So they have invented the world's best OCR software?
3 comments

Not really, unless your corpus consists mainly of hopelessly distorted characters.

They state a captcha solving rate of around 90%.

For OCR to be cost-competitive, you typically need it to be correct on about 98% of characters or more; below that and it is typically cheaper to have a human typing in the text than to have a human correct OCR'd text.

Modern OCR engines typically do better than 99% on text that isn't really badly damaged (my MSc. dissertation was on error correction in OCR, and as part of that I tested some engines with pages that had been crumpled, intentionally damaged with sand and liquids, and even then many of the engines managed more than 99%).

hi, would it be possible to see your dissertation somewhere? thx
Sorta relevant xkcd: http://xkcd.com/810/
Actually they have invented a supplement to OCR software that will work for the characters that OCR is not certain about. The world`s best OCR software would be software that recognizes when it should pass off a patch of text to this new AI engine to decode.