Hacker News new | ask | show | jobs
by smoldesu 1067 days ago
I still use Tesseract. It's not the fastest or most-accurate anymore, but it gets what I need off of PDF files.
1 comments

Does it work well with scanned PDF? In my experiments it was not giving the correct output.
Explore different page segmentation modes and make sure you are using v4 (it's a massive step up)