|
|
|
|
|
by aidenn0
1086 days ago
|
|
FWIW, I just tried EasyOCR on some sans-serif text and Tesseract5 absolutely blew it out of the water. The only thing Tesseract got wrong that EasyOCR (sometimes) got right was uppercase Is ("I") were recognized pretty much 100% of the time as vertical bars ("|"), but since my text of interest is extremely unlikely to have any vertical bar characters, a simple sed post-processing stage fixed that. - Tesseract5 *demolished EasyOCR on paragraph detection, getting that 100% on the 10 pages I checked. EasyOCR missed most of the paragraph breaks. - Tesseract got most of the punctuation correct, EasyOCR only got apostrophes and two double-quotes (out of 14) correct. Every single period, comma, exclamation mark, and hyphen was missing or wrong, as were most of the double-quotes. Some question marks were recognized, but with garbage after them. - In general EasyOCR seems to just add in square closing brackets ("]") where none are |
|