| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cscurmudgeon 584 days ago
	Are there any intrinsic dis/advantages of bidirectional models over causal models for in-context learning? It seems that unidirectional model just have been explored and worked on more.

1 comments

patelajay285 584 days ago

When you train bidirectionally only, you don't get a generative model, that would be the downside. However, you can train on a mixture of causal and bidirectional objectives as some LLM pre-training has done. As far as I am aware, there are no downsides of that, but it is not more common simply because the standard practice has been to train causal only and there just isn't enough funding/attention to go into experimenting on every axis of pre-training (which can be very expensive).

link

namibj 584 days ago

No, you can generate with them using diffusion.

link

zxexz 584 days ago

Yep. That technique works very well. Surprised that it’s not more widely used.

link

byefruit 583 days ago

This is very interesting. Have you got any references describing this approach?

link

namibj 571 days ago

I'll try to remember to search some out that I read back when I did a literature review of the subject (probably was 11 months ago).

link

mycall 584 days ago

Isn't Q* (or Quiet-STaR) a causal and bidirectional objective learning system?

link