| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by janalsncm 544 days ago
	On your second point, most modern LLMs are decoder only. And as for why adding a classification head isn’t optimal, the decoders you’re referring to have 10x the parameters, and aren’t trained on encoder-type tasks like MLM. So there’s no advantage on any dimension really.