| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by katzinsky 681 days ago
	I've had very poor results using LLaVa for OCR. It's slow and usually can't transcribe more than a few words. I think this is because it's just using CLIP to encode the image into a singular embedding vector for the LLM. The latest architecture is supposed to improve this but there are better architectures if all you want is OCR.