| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zozbot234 41 days ago
	I assume these are just output layers that are trained on the hidden state from the larger model - that's how MTP works. It's not a separate drafting model.