| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mdani 1740 days ago
	Does anyone know any OCR (including closed source) that can handle nastaliq script for Urdu, Farsi etc? Tesseract can't do this today due to complex ligatures I think.

2 comments

nethunters 1740 days ago

I've used Google Vision API for a wide variety of Arabic fonts and it has worked pretty well with recognising ligatures but not diacritics as it either doesn't recognise them or adds non-existent ones.

link

wnscooke 1740 days ago

Uighursoft had an OCR app that did all kinds of ltr texts. Give that a try.

link