Hacker News new | ask | show | jobs
by tomatocracy 992 days ago
I started with a script similar to the one you're using (though hand-crafted) with my ScanSnap S1500 (though I have mine run the PDF conversion in the background so I can immediately scan another document without having to wait - this is easy to do now with scanpdf). I've been doing this for about 12 years now, originally manually sorting into directories and using "pdfgrep" to find stuff but more recently I've put everything into a paperless-ngx instance (gradually tagging all the old documents).

I've switched my hand-crafted scripts recently to use scanpdf[1] which seems to give better results (once I tweaked it to be a little less eager to downconvert to B+W). I experimented with using OpenCV models for cropping and straightening (based on examples in a stackoverflow thread at [2]) but I found results were worse than scanpdf so far.

1. http://badge.fury.io/py/scanpdf 2. https://stackoverflow.com/questions/28935983/preprocessing-i...