|
|
|
|
|
by daveguy
134 days ago
|
|
> which is supposed to be somewhat universal in correcting similar OCR'd PDFs Xerox would like a word. https://news.ycombinator.com/item?id=29223815 Point being, "correcting" to "correct looking" may be worse than just accepting errors. Errors are often clearly identified by humans as a nonsense word. "Correcting" OCR can result in plausible, but wrong results that are more difficult for the human in the loop to identify. |
|
So yes, the "fixed" output has errors, but it’s not hallucinating details like an LLM, nor is it trying to produce output that conforms to any linguistic or stylistic heuristics.
The phrase "correcting similar OCR'd PDFs" should have been "correcting similar OCR'd base 64 representations of PDFs".