| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by astrange 1015 days ago
	The image model knows what images look like even without prompts, and if you train it on a trillion images it will create a latent space where similar pictures have similar embeddings. Inaccurate captions for some of them may mean that the text encoder can't get you to those embeddings, but they're still in there. What this means is that text prompting is a bad way to drive an image generating model.