Hacker News new | ask | show | jobs
by osanseviero 582 days ago
Yes, they are still used

- Encoder based models have much faster inference (are auto-regressive) and are smaller. They are great for applications where speed and efficiency are key. - Most embedding models are BERT-based (see MTEB leaderboard). So widely used for retrieval. - They are also used to filter data for pre-training decoder models. The Llama 3 authors used a quality classifier (DistilRoberta) to generate quality scores for documents. Something similar is done for FineWeb Edu

1 comments

Wait, I thought GPT's were autoregressive and encoder only like BERT used masked tokens? You're saying BERT is auto-regressive or am I misunderstanding?
You're right. Encoder only models like BERT aren't auto-regressive and are trained with the MLM objective. Decoder only (GPT) and encoder-decoder (T5) models are auto-regressive and are trained with the CLM and sometimes the PrefixLM objectives.
You can mask out the tokens at the end, so its technically autoregressive.