|
|
|
|
|
by YZF
4275 days ago
|
|
It has OCR but wasn't working so great. It uses Tesseract. I'm not absolutely sure why it wasn't working well in the past, possibly something to do with different fonts/display rendering (e.g. ClearType and such). It "almost" worked so maybe it got better or maybe there's some tuning you can do. Didn't spend too much time on it. |
|
Ultimately Tesseract was primarily designed to operate on text which had been printed and then scanned, whereas the text on screen is lower resolution, anti-aliased, on a coloured background, etc etc.
Some further details of our OCR investigations here: http://stb-tester.com/blog/2014/04/14/improving-ocr-accuracy...
The TLDR version is: Training Tesseract on your font doesn't help; scaling up the text 3x before passing it to tesseract gives a massive improvement (I don't know if Sikuli does this); normalising ligatures & punctuation gives an additional slight improvement.