Hacker News new | ask | show | jobs
by mackatsol 1762 days ago
I’ve heard of tools in Python that can extract data and text from PDF files…

My bank offers CSV downloads of the same data. Look for that first! :)

1 comments

The problem is that this is what BofA considers 'historic' data (older than 18 months, I think). Only PDF statements are available. And the text is encoded as well, which means I cannot just copy paste all transactions text and clean it up in sheets or a text editor (Amex has copyable text PDFs)
By encoded you mean it's an image and not text? That sounds like a deliberately obtuse way to do things.

Maybe crop your person data out of the pdf's using some script.. then use mechanical turk or some other piece work service to get your data typed in.

Or try OCR.