|
|
|
|
|
by cfcef
3843 days ago
|
|
I hope not. Tesseract delivers bad results on high quality scans, far below the same OCR quality achieved by services like Google Books. What the OCR market needs is someone who will bring that level of OCR quality - or better - to the masses (perhaps some deep learning grad student with time to kill?), not yet another wrapper around Tesseract. We have those already! |
|
Here's a nice intro[1] that later talks about how it achieves higher accuracy using an LSTM model[2].
[0] https://github.com/tmbdev/ocropy
[1] http://www.danvk.org/2015/01/09/extracting-text-from-an-imag...
[2] http://www.danvk.org/2015/01/11/training-an-ocropus-ocr-mode...