Hacker News new | ask | show | jobs
by UltraSane 62 days ago
For books that you want to keep the formatting the best option is to use Adobe Acrobat Pro and its Editable Text and Images feature. This replaces the scanned letters with a custom TrueType font. I used this in college to scan textbooks and it worked really well. Modern OCR on books is incredibly accurate.

see https://www.youtube.com/watch?v=bhJ9zqY8Da0

1 comments

Open-source, free version of this is Stirling PDF https://github.com/Stirling-Tools/Stirling-PDF where you can do very accurate OCR while keep the formatting.