| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by osanseviero 582 days ago
	Yes, they are still used - Encoder based models have much faster inference (are auto-regressive) and are smaller. They are great for applications where speed and efficiency are key. - Most embedding models are BERT-based (see MTEB leaderboard). So widely used for retrieval. - They are also used to filter data for pre-training decoder models. The Llama 3 authors used a quality classifier (DistilRoberta) to generate quality scores for documents. Something similar is done for FineWeb Edu

1 comments

itchyjunk 582 days ago

Wait, I thought GPT's were autoregressive and encoder only like BERT used masked tokens? You're saying BERT is auto-regressive or am I misunderstanding?

link

woadwarrior01 582 days ago

You're right. Encoder only models like BERT aren't auto-regressive and are trained with the MLM objective. Decoder only (GPT) and encoder-decoder (T5) models are auto-regressive and are trained with the CLM and sometimes the PrefixLM objectives.

link

ipsum2 582 days ago

You can mask out the tokens at the end, so its technically autoregressive.

link