Hacker News new | ask | show | jobs
by patelajay285 578 days ago
When you train bidirectionally only, you don't get a generative model, that would be the downside. However, you can train on a mixture of causal and bidirectional objectives as some LLM pre-training has done. As far as I am aware, there are no downsides of that, but it is not more common simply because the standard practice has been to train causal only and there just isn't enough funding/attention to go into experimenting on every axis of pre-training (which can be very expensive).
2 comments

No, you can generate with them using diffusion.
Yep. That technique works very well. Surprised that it’s not more widely used.
This is very interesting. Have you got any references describing this approach?
I'll try to remember to search some out that I read back when I did a literature review of the subject (probably was 11 months ago).
Isn't Q* (or Quiet-STaR) a causal and bidirectional objective learning system?