| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bo0tzz 801 days ago
	CLIP does not have explicit OCR support, but it does somewhat coincidentally have a slight understanding of text. This is explained by training captions containing (some of) the text that is in the image.

1 comments

osmarks 800 days ago

I think the SigLIP models' dataset (WebLi) includes OCRed things too, so they have very good text understanding. I tested a bunch of things for my own meme search engine.

link

osmarks 800 days ago

(https://arxiv.org/pdf/2209.06794.pdf page 20.)

link