| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ce4 1302 days ago
	I use https://kebekus.gitlab.io/scantools for scanning, it builds on top of tesseract and works great for pdf enhancements

1 comments

rjzzleep 1302 days ago

You might be interested in https://github.com/ocrmypdf/OCRmyPDF then.

It does quite some preprocessing on the PDF pages before passing it on to tesseract.

link

angrygoat 1302 days ago

I've found ocrmypdf to be excellent: the only issue I've had is with PDFs with differing page sizes; it seems to scale everything up to the size of the largest page, which can be a bit of a pain.

link