| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by daemonologist 206 days ago
	For pure image embedding, I find DINOv3 to be quite good. For multimodal embedding, maybe RzenEmbed. For captioning I would use a regular multimodal LLM, Qwen 3 or Gemma 3 or something, if your compute budget allows.