Hacker News new | ask | show | jobs
by vidarh 4621 days ago
Not really, unless your corpus consists mainly of hopelessly distorted characters.

They state a captcha solving rate of around 90%.

For OCR to be cost-competitive, you typically need it to be correct on about 98% of characters or more; below that and it is typically cheaper to have a human typing in the text than to have a human correct OCR'd text.

Modern OCR engines typically do better than 99% on text that isn't really badly damaged (my MSc. dissertation was on error correction in OCR, and as part of that I tested some engines with pages that had been crumpled, intentionally damaged with sand and liquids, and even then many of the engines managed more than 99%).

1 comments

hi, would it be possible to see your dissertation somewhere? thx