| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Aivean 1683 days ago
	GLIDE is NOT Dall-E. Dall-E is a transformer (basically GPT-3), while GLIDE is a diffusion model. While they share some similarities, the major difference is that transformers generate image sequentially from top to bottom, pixel-by-pixel (technically, token-by-token), so one can condition them only by the text and the top of the image. At the same time, diffusion models predict all pixels at the same time, so one can naturally trade compute for result quality (do more inference iterations) and, beside sampling, do other image manipulation tasks, like text-prompted inpainting.