Hacker News new | ask | show | jobs
by wswope 2373 days ago
Potentially helpful notes:

The character whitelist/blacklist functionality doesn't work for the default LSTM-based engine.

Regarding preprocessing, upscaling the image size can have a dramatic impact on performance.

IIRC tessdata_fast (which the article mentions) is the default that ships with most prebuilt versions of Tesseract, so you probably don't need to mess with that. In my use case, I found that tessdata_best actually performed slightly worse in terms of accuracy.