Hacker News new | ask | show | jobs
by appleflaxen 2654 days ago
this is so awesome!

but how is it that we have RNN solutions for handwriting when we don't even have a standard, canned RNN for OCR?

I know tesseract and related projects exist, but when I've tried them they have been fairly brittle with lower accuracy than I was expecting. Accuracy was especially problematic for letter combinations like "-ing" that would consistently be recognized as "-mg".

Is there a good ML OCR library I'm missing?

2 comments

Just a side comment: take into account that (as per the paper) there is temporal input in Gboard (i.e. the timestamp of each stroke is important).

You do not have that for ing, so the software does not know that the dot is “independent”).

The reason is that online OCR (this particular case) is entirely different from offline OCR.

Online OCR is when you input the strokes directly on the tablet/phone, so it becomes a sequence of XY coordinates with an associated timestamp. It takes into account where you start and where you end the stroke on the canvas, along with the intermediate points (information galore).

Offline OCR is when you take a photo of your handwriting in your notebook, so you just get the raw pixels of a image. In offline OCR, you'd also have to properly segment and binarize the image before the OCR step.

With that being said, tesseract (version 4) uses an LSTM.