| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vov_or 1203 days ago
	There is not only a difference in the data source but pre-trained tasks as well. But you are right, a fine-tuned models on human-annotated data are way better than zero-shot (just pre-trained) on Image retrieval. And it is correct for CLIP, ALBEF, VICHA, and UFORM.

1 comments

Any plans to document how to fine tune your models then?

It will take some time, but yes, we have this in our plans.