Hacker News new | ask | show | jobs
by danielbln 1180 days ago
Generative image models don't use transformers, they're diffusion models. LLMs are transformers.
2 comments

Diffusion models can use a transformer architecture, example: DiT. Stable Diffusion is using a U-Net architecture with transformer blocks.
Ah yes that's right. Well they technically do use a visual transformer for CLIP text encoder as I understand.