Hacker News new | ask | show | jobs
by Aivean 1636 days ago
GLIDE is NOT Dall-E. Dall-E is a transformer (basically GPT-3), while GLIDE is a diffusion model. While they share some similarities, the major difference is that transformers generate image sequentially from top to bottom, pixel-by-pixel (technically, token-by-token), so one can condition them only by the text and the top of the image. At the same time, diffusion models predict all pixels at the same time, so one can naturally trade compute for result quality (do more inference iterations) and, beside sampling, do other image manipulation tasks, like text-prompted inpainting.