Hacker News new | ask | show | jobs
by cscurmudgeon 584 days ago
Are there any intrinsic dis/advantages of bidirectional models over causal models for in-context learning? It seems that unidirectional model just have been explored and worked on more.
1 comments

When you train bidirectionally only, you don't get a generative model, that would be the downside. However, you can train on a mixture of causal and bidirectional objectives as some LLM pre-training has done. As far as I am aware, there are no downsides of that, but it is not more common simply because the standard practice has been to train causal only and there just isn't enough funding/attention to go into experimenting on every axis of pre-training (which can be very expensive).
No, you can generate with them using diffusion.
Yep. That technique works very well. Surprised that it’s not more widely used.
This is very interesting. Have you got any references describing this approach?
I'll try to remember to search some out that I read back when I did a literature review of the subject (probably was 11 months ago).
Isn't Q* (or Quiet-STaR) a causal and bidirectional objective learning system?