| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zo1 4054 days ago

And they limit you to scanned pages per year on the corporate/server offerings. Doubly-so for all their dev/api license options.

And the kicker: You can't buy those licenses from them directly. They put you in contact with some randome monopoly local distributor that usually has mandatory "training" charges.

Messy, and I'm planning on staying away from that with a ten-foot pole.

1 comments

totalcookie 4054 days ago

If there is no opensource/free software with the same quality, what then? What are you using as an OCR server side system on Linux? I'm sure not good enough to write my own OCR better than Abbyy.

link

zo1 4054 days ago

As the others in the thread have mentioned. Constrain your problem as a computer-vision one to segment nice pieces of work for Tesseract. Along with some nice training data, and possibly human validation if that's feasible.

All do-able within Linux.

link

totalcookie 4054 days ago

As the parent of this comment thread mentions, Tesseract is not very great for mass usage due to the error rate with Abbyy much better. I would be interested in experience not opinion.

link