|
|
|
|
|
by bicsi
314 days ago
|
|
What if I told you that one can model bidirectional attention just by recurring over causal attention, and it’s still fast enough? Hint: It’s called chain of thought. I strongly believe it’s time to discontinue diffusion models, solely on the fact that iterated auto-regression is faster, more parallelizable, and just as potent with proper prompting techniques (of course, unless you consider CoT as a form of diffusion, which it essentially is). |
|