|
|
|
|
|
by HarHarVeryFunny
257 days ago
|
|
When you say "classification task fine tuning", are you referring to RLHF? RLHF seems to have been the critical piece that "aligned" the otherwise rather wild output of a purely "causally" (next-token prediction) trained LLM with what a human expects in terms of conversational turn taking (e.g. Q & A) and instruction following, as well as more general preferences/expectations. |
|