| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kkielhofner 940 days ago

Tesseract alone is widely known to be "meh" at this point.

If you look at RAG frameworks as one example they'll typically use/support a variety of implementations. Tesseract is almost always supported but it's rarely ideal with projects like Unstructured[0] and DocTR[1] being preferred. By leveraging more-or-less SOTA vision models[2][3] they embarrass Tesseract.

I haven't compared them to the Apple Vision framework but they're absolutely better than Tesseract and potentially even Apple Vision.

There are also various approaches to use these in conjunction but that gets involved.

[0] - https://github.com/Unstructured-IO/unstructured-inference

[1] - https://github.com/mindee/doctr

[2] - https://github.com/mindee/doctr#models-architectures

[3] - https://github.com/Unstructured-IO/unstructured-inference#mo...

5 comments

fancy_pantser 940 days ago

https://github.com/mindee/doctr/issues/1049

https://github.com/JaidedAI/EasyOCR#whats-coming-next

Happy to see OCR is advancing lately, but I really need HWR.

I am looking for something this polished and reliable for handwriting, does anyone have any pointers? I want to integrate it in a workflow with my eink tablet I take notes on. A few years ago, I tried various models, but they performed poorly (around 80% accuracy) on my handwriting, which I can read almost 90% of the time.

link

Someone 940 days ago

Reading https://heartbeat.comet.ml/comparing-apples-and-google-s-on-... (2017), I expect this code to work for handwritten text.

How well it works on your handwriting is for you to test, but if you, having all kinds of contextual information, can’t read it well, I guess it won’t, either.

link

riveducha 940 days ago

This is maybe not a solution, but how does ChatGPT do on your handwriting if you upload a photo? If that works well then maybe you can use the API?

link

animal_spirits 939 days ago

AWS Textract is by far the best OCR engine we've used, it does great with handwritten text

link

mcbetz 940 days ago

I found this detailed comparison of OCRs (both open source and cloud services) super helpful: https://source.opennews.org/articles/our-search-best-ocr-too...

docTR comes out as strongest open solution.

link

haolez 940 days ago

Looks nice! Do you know if they can do table structuring as well? Something similar to what Amazon Textract does[0].

[0]https://docs.aws.amazon.com/textract/latest/dg/how-it-works-...

link

beembeem 940 days ago

I have found Tesseract to be both better than I expect (it feels great when it works most of the time) and worse than I expect (not quite enough correct data to fully rely on).

link

mdani 940 days ago

Does anyone know what languages Apple supports? The docs don't have a list. Tesseract might be "meh" but it is probably the best open source option available for devnagari scripts or Persian, for example.

link

lelandfe 940 days ago

I've used it on a number of Cyrillic languages (Russian, Bulgarian, etc), Hungarian, Turkish, along with the typical ones (Spanish, German, French, Italian, Portuguese). I've heard it supports Chinese. I just tried Persian and devnagari samples on my Mac and it could not do either.

link