| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by HarHarVeryFunny 257 days ago
	When you say "classification task fine tuning", are you referring to RLHF? RLHF seems to have been the critical piece that "aligned" the otherwise rather wild output of a purely "causally" (next-token prediction) trained LLM with what a human expects in terms of conversational turn taking (e.g. Q & A) and instruction following, as well as more general preferences/expectations.