Hacker News new | ask | show | jobs
by trimber 5876 days ago
I believe most enterprise systems are fairly expensive. Maybe you would consider building your own system? I have done a similar thing, by writing a plugin for Google Desktop Search that indexes TIF files. Writing a plugin is pretty straightforward. Now for the OCR part, there is only a few Open Source OCR engines, the most popular being Tesseract. The quality of Tesseract's results is pretty good and I believe is sufficient for many systems.
1 comments

Raw Tesseract is a pain to use. http://code.google.com/p/ocropus/ This is much easier to deal with and uses Tesseract as the default engine - this is a google funded project btw.