| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by acdha 3843 days ago
	Is the problem really Tesseract or the fact that it doesn't have a robust front-end performing segmentation, de-skewing, better binarization, etc? I've heard that Google Books is actually using the Tesseract engine but has seen better results in part from better training but mostly from a more advanced system breaking each page into the blocks of text which are actually OCRed.