Hacker News new | ask | show | jobs
by shoshin23 3111 days ago
Image text recognition is a major problem we're trying to solve in our startup. I would love to be pointed to some SOTA research in this space. Hard to find anything by Googling about it.

As far as our experience goes, Cloud Vision API is a killer option compared to both AWS and MSFT. It's pricier than AWS though and is slower. MSFT is terrible in both price and speed.

2 comments

If you're trying to handle text "in the wild" and not scanned documents, the keyword is "scene text". Most papers are focused on either detection/localization, i.e. finding the location of text, or recognition, i.e. recognizing the actual content given a cropped text image.

Here are some current state-of-the-art papers + code where available about detection:

Fused Text Segmentation Networks for Multi-oriented Scene Text Detection https://arxiv.org/abs/1709.03272

EAST: An Efficient and Accurate Scene Text Detector https://arxiv.org/abs/1704.03155 https://github.com/argman/EAST

Detecting Oriented Text in Natural Images by Linking Segments https://arxiv.org/abs/1703.06520 https://github.com/dengdan/seglink

Arbitrary-Oriented Scene Text Detection via Rotation Proposals https://arxiv.org/abs/1703.01086 https://github.com/mjq11302010044/RRPN

And for recognition:

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition https://arxiv.org/abs/1507.05717 https://github.com/bgshih/crnn

Robust Scene Text Recognition with Automatic Rectification https://arxiv.org/abs/1603.03915

I'd also add on the subject the following whitepaper from MS - http://digital.cs.usu.edu/~vkulyukin/vkweb/teaching/cs7900/P...
Note that this paper is from 2010 and thus, while quite influential for its time far from the current state-of-the-art. The stroke width transform method that it introduced is simply not as good as current deep learning-based methods.

If you want to get a (slightly out of date but what can you do, the field is moving very fast) overview see this survey from 2016:

Scene Text Detection and Recognition: Recent Advances and Future Trends http://mclab.eic.hust.edu.cn/UpLoadFiles/Papers/FCS_TextSurv...

Check out the post on the custom dropbox implementation. Helping them (and nsa) to search through files :)