|
|
|
|
|
by eliomattia
1129 days ago
|
|
By human feedback in T, I meant indeed RLHF. By chat histories H in T, I meant a later selection of user feedback. While plug-ins and added context can be visualized as g(f(H)), fine tuning could be thought of as g(f)(H). There are humans in the loop, however which entailment loop? Not the one affecting the hardware-like f. Unless we retrain de novo. |
|
Your premise is that these are not editable models, but you restrict that to a situation that is not true in practice. These models are constantly changing based on human feedback, updated information, config changes, and weight manipulation.
If you follow the conversation around LLMs, one of the top issues people have is inconsistent responses and the models changing under their prompts and breaking them.