Hacker News new | ask | show | jobs
ID Card Digitization and Information Extraction using Deep Learning (nanonets.com)
49 points by ole_gooner 2399 days ago
6 comments

This is a decent OCR / structured data extraction literature review but is absolutely not "building an ID card reader from scratch with deep learning".

It's also very hand-wavey on the details of how to actually use graph convolutional networks to extract structured ID card data. For example what "bounding box information" is used in your node representations? What is the architecture of your biLSTM?

This seems very much more like a promotion for your API than useful information on how to build a system that extracts data from ID cards.

Sure, the blog might have missed out on the finer details of the different architectures. We intended to give an overview of some of the techniques used to build such information extraction models, we will definitely dive deeper into one of the architectures/model as a second part to this blog.
The article has quite a nice introduction to deep learning concepts, but the headline claim of building an ID card reader from scratch is little more than "use our API".
Ok, we've changed the title from "Building an ID card reader from scratch with deep learning" to the what the article's title says.
I got asked to spec out something for a nightclub ~10 years ago for extracting text from drivers licenses and 18+ cards. I didn't go ahead with the job (client wasn't paying much and the way he wanted to use it was ethically and legally grey) but I did prototype something and recall getting good results from the Python OCR libraries available at the time. What advantages would you get from a deep learning approach compared with what was available back then?
It’s surprising to hear you were getting good results with Python OCR libraries 10 years ago because there aren’t any reliable Python OCR libraries even today! ‘:) Tesseract is very fickle and doesn’t work well in poor lighting conditions (like a nightclub)

What deep learning gives you that’s really useful and valuable (beyond better OCR) is that you can use graph convolutional networks to automatically parse the OCR output and convert it into structured data. You could hand-write a parser or use a template matching approach but you’ll have to create a new parser/template for every ID card type whereas the GCN approach can be used to learn the parser

the problem with extracting information is not just limited to getting OCR results. the bigger problem while building something like this is extracting the fields and understanding the structure of the document automatically. using some python OCR libraries, you'd probably get text results for a drivers license or a passport separately and process these results on separate rules written for each. with deep learning a non-template solution seems possible which will figure out which ID it is, where the name, address, relevant numbers are and put them in a structure.
In the case of U.S. drivers licenses, there are standards for the 2D barcode that would make it very straightforward to parse: https://www.aamva.org/uploadedFiles/MainSite/Content/Solutio...
"CNNs versus GCNs" is not necessarily correct? You will need to apply GCN on top of CNN to get the structure out of otherwise unstructured text?
the article reviews all the recent deep learning based approaches to digitization and OCR along with an explanation of how graph neural networks work and how they can be applied to the problem of ID card digitization.
but why not building a card reader by reading the chip content (more accurate than an image)?
The title says "ID card". First you need to read the printed texts so that you can be able to read the chip of ID cards.
This is just an example of a common problem e.g. invoices, transcripts, purchase orders, etc.