Hacker News new | ask | show | jobs
by talldayo 670 days ago
I think Tesseract is the smarter/faster/less obnoxious choice if you're not trying to parse weird meme text like the blog is doing. There's almost certainly a better paid option available in our enlightened AI age, but I don't even think you'd need AI for this use-case.
5 comments

Last time I used tesseract (a year ago?) it’s still pretty useless if your text isn’t on a clean background. It doesn’t even come close to Apple’s proprietary on-device OCR.
There is a whole page on their site dedicated to methods for improving the accuracy: https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

I think most frontends to tesseract employ a lot of these methods and maybe more... but trying to use tesseract directly can indeed be difficult without extra processing of the image first.

I know, I tried many things with the photo collection I was working with, including advice from that very page, generally to relatively poor results. (I ended using Apple’s framework on macOS.) The point is tesseract is definitely not “smarter” in any way, at best it’s on par with Apple’s OCR when you hand it very clean text.
The Apple framework is much, much better than Tesseract, and quicker as well. It is really good. Of course if you don’t need on-device processing, then there are cloud services that are better.
> I think Tesseract is the smarter/faster/less obnoxious choice [...] There's almost certainly a better paid option available in our enlightened AI age

It would have cost $375,000 to use cloud OCR for this project. Mandatory is absolutely a baller, but not crazy enough to spend that kind of money on the project.

If you can get Tesseract to generate comparable results with sub-optimal images from eBay listings, I'd love to know more.

I have tried to use Tesseract to extract serial numbers from photos of iPhone boxes and had a 100% failure rate.

I have then employed a multimodal LLM and had a 100% success rate.

I have seen some models mix LLM with ocr to improve both.

Considering what apps like Notes can do low key on iOS… I wouldn’t be surprised if there would exist more capability.

Iirc, Apple was holding back improvements to Siri and other techs.