Hacker News new | ask | show | jobs
by ping_pong 1798 days ago
I used Tesseract almost 10 years ago to scan letters from a Words With Friends board. I was getting over 90% accuracy, but the letters with score values on them corrupted the letters and screwed up the detection. So I created a new "language" which Tesseract supports, that incorporated the score value corruption as part of the OCR translation. I got to over 98% accuracy with that which was about as good as I could get.

Overall I thought it was great and I wonder how good it would perform these days with 10 years of improvements!