Does anyone know any OCR (including closed source) that can handle nastaliq script for Urdu, Farsi etc? Tesseract can't do this today due to complex ligatures I think.
I've used Google Vision API for a wide variety of Arabic fonts and it has worked pretty well with recognising ligatures but not diacritics as it either doesn't recognise them or adds non-existent ones.