Y
Hacker News
new
|
ask
|
show
|
jobs
by
tripplyons
773 days ago
To your 3rd point, most diffusion models already use a transformer-based architecture (U-Net with self attention and cross attention, Vision Transformer, Diffusion Transformer, etc.).