Hacker News new | ask | show | jobs
by bicsi 314 days ago
What if I told you that one can model bidirectional attention just by recurring over causal attention, and it’s still fast enough? Hint: It’s called chain of thought.

I strongly believe it’s time to discontinue diffusion models, solely on the fact that iterated auto-regression is faster, more parallelizable, and just as potent with proper prompting techniques (of course, unless you consider CoT as a form of diffusion, which it essentially is).

3 comments

I'd respectfully suggest that it's perhaps not time to "discontinue diffusion models". Minsky and Papert set AI back by decades by suggesting neural networks were a dead end which couldn't learn XOR. There's not a chance of an HN comment having the same effect of course but my point is that it's easy to dismiss things prematurely.
Can you explain how CoT is a form of diffusion or models bidirectional attn?
Chain of thought is not a form of diffusion. Diffusion models clearly have characteristics that are useful and worthy of further research and should not be “discontinued“