Hacker News new | ask | show | jobs
by krapht 399 days ago
I just built a pipeline with tesseract last year. What's better that is open source and runnable locally?

VLLM hallucination is a blocker for my use case.

2 comments

If you are stuck with open source, then your options are limited.

Otherwise I'd say just use your operating system's OCR API. Both Windows and MacOS have excellent APIs for this.

How is a hallucination worse than a Tesseract error?
Because the VLM doesn't know it hallucinated. When you get a Tesseract error you can flag the OCR job for manual review.
Hallucinations are hard to detect unless you are a subject-matter expert. I don't have direct experience with Tesseract error detection.
Latter is more likely to get debugged.
It could hallucinate obscene language, something which is less likely with classic OCR.