Hacker News new | ask | show | jobs
by andrejk 2531 days ago
Azure has a separate service to read forms and formatted docs. https://azure.microsoft.com/en-us/services/cognitive-service...
2 comments

Thanks I was not aware! I’ll have to figure out how to easily test it (AWS tectrsct is actually really nice in that regard)
Do you know how good it is? I have a LOT of structured documents that I need to OCR.
MSFT person here - give it a try! sign up, and you get a free trial that can allow you to easily benchmark.
Interesting. That is actually something I've been looking for and I do have a msdn sub.

fyi that defaulted to indian rupee as default currency for me (UK based & zero indian connections). Weird

Have you tried the tabula free program? I use it for some finance work reading filings.

http://tabula.technology/

Yep! It's great, but is maybe 60% there, so I'm looking for something that can extract much more structure from a document. I doubt what I'm looking for will exist for another 10 years, though.
is it feasible to create loose templates for where the data is and extract that way? i have a mothballed project that did pretty well. it was able to discern different templates from a mass of documents.
I'm curious, if you email me a sample I can tell you what's possible.