|
|
|
|
|
by quinndupont
71 days ago
|
|
Very helpful analysis that confirms everything I’ve encountered. OCR remains a thorny issue. The author talks about professional workflows struggling with tables and such, but I’ve found it challenging to get clean copies of long documents (books). The hybrid workflow (layout then OCR) sounds promising. |
|
Think of an LLM that corrects 898,00 to 888,00. It feels like the David Kriesel Xerox case. Still, it's an interesting way to think of the issue of optical character recognition.