Hacker News new | ask | show | jobs
by vmarius 2174 days ago
AFAIK Tesseract is trained to recognize characters and uses a bunch of steps to prepare image for recognition. Steps like removing noise, fixing contrast and resizing.

It means that it performs not-so-good when for example image contains black text and white text on green background since this is not "normalized" through image preparation steps and it cannot detect white text on green background (but you can do it yourself)