|
|
|
|
|
by katzinsky
681 days ago
|
|
I've had very poor results using LLaVa for OCR. It's slow and usually can't transcribe more than a few words. I think this is because it's just using CLIP to encode the image into a singular embedding vector for the LLM. The latest architecture is supposed to improve this but there are better architectures if all you want is OCR. |
|