Hacker News new | ask | show | jobs
by BarakWidawsky 13 days ago
You’re mostly right but conflating attention with autoregressive/causal which is the real issue that prevents you from using more compute

You can use diffusion with attention, and this model does in fact use attention

1 comments

Yes, I should have said autoregressive.