| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by itchyjunk 582 days ago
	Wait, I thought GPT's were autoregressive and encoder only like BERT used masked tokens? You're saying BERT is auto-regressive or am I misunderstanding?

2 comments

woadwarrior01 582 days ago

You're right. Encoder only models like BERT aren't auto-regressive and are trained with the MLM objective. Decoder only (GPT) and encoder-decoder (T5) models are auto-regressive and are trained with the CLM and sometimes the PrefixLM objectives.

link

ipsum2 582 days ago

You can mask out the tokens at the end, so its technically autoregressive.

link