| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rfmw19 2356 days ago

My method was more specific to bills and finance documents. I used a generic photo scanner. It's not as automatic as the purpose-built document scanners that have automatic feeders and support multiple pages, but I wanted something that I could use for photography as well.

I coupled this with some very hacked together Perl scripts with Tesseract OCR[1] that fed in data to ledger-cli[2] for handling bills. I put other generic documents into folders by date.

It worked pretty well, and I was able to generate some pretty graphs from data that was fully reconciled with financial institutions like my bank, credit card, investments, etc., but still took too much time. So what do I do now? Nothing!

This was years ago. I assume there is now better support from financial institutions for extracting data and this coupled with improved OCR/machine learning might make things more robust and make it worthwhile to try again.

[1] https://en.wikipedia.org/wiki/Tesseract_(software)

[2] https://www.ledger-cli.org/