|
|
|
|
|
by pilooch
488 days ago
|
|
The question is what is OCR for ? If it's to answer questions and work with a document, then VLMs do actually contain self correcting mechanisms. That is, the end to end image + text input to text output is statistically grounded, by training.
So the question to ask is what do you need OCR for ? Fedding an LLM? Then feed it to the VLM instead. Some other usage ? Well, to be decided.
But near now, CTX and lstms are done with, because VLMs do everything: finding the area to read, reading, embedding, and answering.
OCR was a mid-step, it's going away. |
|