Hacker News new | ask | show | jobs
by mdani 1740 days ago
Does anyone know any OCR (including closed source) that can handle nastaliq script for Urdu, Farsi etc? Tesseract can't do this today due to complex ligatures I think.
2 comments

I've used Google Vision API for a wide variety of Arabic fonts and it has worked pretty well with recognising ligatures but not diacritics as it either doesn't recognise them or adds non-existent ones.
Uighursoft had an OCR app that did all kinds of ltr texts. Give that a try.