Y
Hacker News
new
|
ask
|
show
|
jobs
by
BarakWidawsky
13 days ago
You’re mostly right but conflating attention with autoregressive/causal which is the real issue that prevents you from using more compute
You can use diffusion with attention, and this model does in fact use attention
1 comments
samuelknight
13 days ago
Yes, I should have said autoregressive.
link