Hacker News new | ask | show | jobs
by snordgren 1036 days ago
As someone who is out of the loop but could use high quality image embeddings right now, what's the best CLIP model right now?
2 comments

it really depends on what you're trying to achieve, if you want to build a semantic image search then a small/base model would be fine, I think that bigger models usually leak to much information that makes the embeddings space to difficult to interpreter for simple algorithm like cosine similarity, if you want to condition a generative model then a bigger model should provide more information about the prompt or the image.
SDXL uses OpenCLIP, and then OpenAI CLIP as a backup basically to allow it to spell words properly, but I think you could replace the second one.