Hacker News new | ask | show | jobs
by HarHarVeryFunny 257 days ago
When you say "classification task fine tuning", are you referring to RLHF?

RLHF seems to have been the critical piece that "aligned" the otherwise rather wild output of a purely "causally" (next-token prediction) trained LLM with what a human expects in terms of conversational turn taking (e.g. Q & A) and instruction following, as well as more general preferences/expectations.