Hacker News new | ask | show | jobs
by tjpnz 2399 days ago
I got asked to spec out something for a nightclub ~10 years ago for extracting text from drivers licenses and 18+ cards. I didn't go ahead with the job (client wasn't paying much and the way he wanted to use it was ethically and legally grey) but I did prototype something and recall getting good results from the Python OCR libraries available at the time. What advantages would you get from a deep learning approach compared with what was available back then?
2 comments

It’s surprising to hear you were getting good results with Python OCR libraries 10 years ago because there aren’t any reliable Python OCR libraries even today! ‘:) Tesseract is very fickle and doesn’t work well in poor lighting conditions (like a nightclub)

What deep learning gives you that’s really useful and valuable (beyond better OCR) is that you can use graph convolutional networks to automatically parse the OCR output and convert it into structured data. You could hand-write a parser or use a template matching approach but you’ll have to create a new parser/template for every ID card type whereas the GCN approach can be used to learn the parser

the problem with extracting information is not just limited to getting OCR results. the bigger problem while building something like this is extracting the fields and understanding the structure of the document automatically. using some python OCR libraries, you'd probably get text results for a drivers license or a passport separately and process these results on separate rules written for each. with deep learning a non-template solution seems possible which will figure out which ID it is, where the name, address, relevant numbers are and put them in a structure.
In the case of U.S. drivers licenses, there are standards for the 2D barcode that would make it very straightforward to parse: https://www.aamva.org/uploadedFiles/MainSite/Content/Solutio...