| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by trimber 5876 days ago
	I believe most enterprise systems are fairly expensive. Maybe you would consider building your own system? I have done a similar thing, by writing a plugin for Google Desktop Search that indexes TIF files. Writing a plugin is pretty straightforward. Now for the OCR part, there is only a few Open Source OCR engines, the most popular being Tesseract. The quality of Tesseract's results is pretty good and I believe is sufficient for many systems.

1 comments

keefe 5876 days ago

Raw Tesseract is a pain to use. http://code.google.com/p/ocropus/ This is much easier to deal with and uses Tesseract as the default engine - this is a google funded project btw.

link