Hacker News new | ask | show | jobs
by tripplyons 773 days ago
To your 3rd point, most diffusion models already use a transformer-based architecture (U-Net with self attention and cross attention, Vision Transformer, Diffusion Transformer, etc.).