| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by azeirah 250 days ago
	Isn't the space you're talking about the input images that are close to the textual prompt? These models are trained on image+text pairs. So if you prompt something like "an apple" you get a conceptual average of all images containing apples. Depending on your dataset, it's likely going to be a photograph of an apple in the center.