| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by viig99 2757 days ago
	No open source ocr doesn't work that great, i work for a telecom company, and we process over millions of documents a month, we built everything in house and now are able to process it at almost 40cents per 1000 documents. It a long process to process huge documents like payslips which require text boundary detection, word identification, spatial clustering and writing parsers (depends on word, segment, and clustering probabilities) which can extract required fields out of the documents.