| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by minimaxir 670 days ago
	The diffusion process is conditioned on CLIP text, which works better (in theory) since the encoded text is aligned with images.