| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by curioussavage 3104 days ago
	Any good open source desktop software with linux support to do this? I don't see why I would personally want a web app for this.

5 comments

joelhaasnoot 3104 days ago

It's a little clunky but here's the one I found best that just worked on Ubuntu: http://gscan2pdf.sourceforge.net/ . It can combine some of the best tools for OCR/cleanup/etc.

My main gripe is that I have a document feeder and manually selecting pages with shift to combine in to a single document and clicking "Save as" is far too much of a hassle. There needs to be a better flow for that.

link

coaxial 3104 days ago

I wrote a collection of bash scripts for that. https://github.com/coaxial/insaned-config

It was initially to use with insaned, but I later came up with a script to tie it all together (scan.sh) because it's faster than jamming the scan button waiting for insaned to register. And with the script, I can queue commands provided I'm fast enough to swap the physical pages in the flatbed scanner.

It also uses the excellent textcleaner imagemagick script to clean up the scans and make them more ocr friendly.

The readme isn't totally up to date, parallel isn't required anymore, and there is no mention of the scan.sh script. But when you run it, it prompts for commands. You might need to edit the scripts to set your own output directories and textcleaner location.

link

jackvalentine 3104 days ago

I haven't tried this yet, but - https://openpaper.work/

Edit: tried it, it's crap.

link

jk2323 3102 days ago

May I ask why? Installation is a bit cumbersome but it seems to be an outstanding program to me. I have been looking very long for something like this.

I have not tired yet how it reacts to huge amounts of data. But best thing: NOT written in Java!

link

jackvalentine 3102 days ago

I installed it on Windows, so the installer was the best bit :)

Maybe it's better on linux but it didn't use system dialogues, the UI behaved a bit strangely and it wasn't particularly intuitive.

Maybe I'm just not the target - in a previous life I supported a HP TRIM ECM which may have left a mark on me.

link

dm319 3104 days ago

Maybe just put the scanned pdfs into a hierarchical folder system, then keep a text file at the root with comma or tab-separated location, ISO date and keywords.

Then your documents are a grep away. Maybe awk to find documents from a date range?

Maybe someone clever could automate this with the OCR output...

link

arca_vorago 3104 days ago

There are, I just can't think of them at the moment, I know though because I setup a bookscanner with a linux box. If I remember right the scan/ocr/archive tools are all seperate, so you would have to script them together.

link

JustSomeNobody 3104 days ago

Well, if you have a home server, having a web app works quite well. But if you don't, then a desktop app would probably be better.

link