| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GaggiX 1143 days ago
	The CLIP text encoder is trained to align with the pooled image embedding (a single vector), which is why most text embeddings are not very meaningful on their own (but still convey the overall semantics of the text). With T5 every text embedding is important.