Hacker News new | ask | show | jobs
by benbarbersmith 993 days ago
If you have any notes on this, I’ve been wanting to set this up for ages and I’d be incredibly grateful!
2 comments

My solution was pretty much the same as what this guy did, although he had a slightly different model of scanner to me, but it's a very similar setup

https://chrisschuld.com/2020/01/network-scanner-with-scansna...

I started with a script similar to the one you're using (though hand-crafted) with my ScanSnap S1500 (though I have mine run the PDF conversion in the background so I can immediately scan another document without having to wait - this is easy to do now with scanpdf). I've been doing this for about 12 years now, originally manually sorting into directories and using "pdfgrep" to find stuff but more recently I've put everything into a paperless-ngx instance (gradually tagging all the old documents).

I've switched my hand-crafted scripts recently to use scanpdf[1] which seems to give better results (once I tweaked it to be a little less eager to downconvert to B+W). I experimented with using OpenCV models for cropping and straightening (based on examples in a stackoverflow thread at [2]) but I found results were worse than scanpdf so far.

1. http://badge.fury.io/py/scanpdf 2. https://stackoverflow.com/questions/28935983/preprocessing-i...

Paperless-ngx supports a folder on disk that you can drop files into and have them ingested. Throw in a samba container pointed at the same directory in your docker-compose and you’ve replicated the same setup.