| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by radarsat1 71 days ago
	This reminds me a lot of the tricks to turn BERT into a generative model. I guess the causal masking that keeps it to essentially be autoregressive is an important difference though. Kind of best of both worlds.

1 comments

krackers 66 days ago

Masked language modeling has been compared loosely to text diffusion [1], so the paper's title claim may be loosely true in some sense even if it's misleading.

[1] https://nathan.rs/posts/roberta-diffusion/

link