Hacker News new | ask | show | jobs
by verdverm 1130 days ago
RLHF entails f for ChatGPT because you cannot get ChatGPT without RLHF. This is before the feedback from users, and was part of the initial training process. Without RLFH, you only have GPT-3 which is very different from its successors that people are actually enthralled with and worried about.

RLHF is Reinforcement Learning from Human Feedback

Thus you cannot get ChatGPT without humans in the loop, making it quite sensitive and irreproducible.

1 comments

By human feedback in T, I meant indeed RLHF. By chat histories H in T, I meant a later selection of user feedback. While plug-ins and added context can be visualized as g(f(H)), fine tuning could be thought of as g(f)(H). There are humans in the loop, however which entailment loop? Not the one affecting the hardware-like f. Unless we retrain de novo.
No one wants to read your imprecise pseudo math in plain text, write in clear sentences

Your premise is that these are not editable models, but you restrict that to a situation that is not true in practice. These models are constantly changing based on human feedback, updated information, config changes, and weight manipulation.

If you follow the conversation around LLMs, one of the top issues people have is inconsistent responses and the models changing under their prompts and breaking them.

https://arxiv.org/abs/2305.12907

> We find that meta-in-context learning adaptively modifies priors over latent variables, ultimately leading to priors that closely resemble the true statistics of the environment. Furthermore, our analysis reveals that meta-in-context learning can not only be used to change prior expectations but is also capable of reshaping an LLM’s learning strategies

This was done using OpenAI's public facing APIs

Even if it is not a permanent "edit" it still influences the model at the lowest levels

The article can be summarized as: context richer prompt history yields answers that are better aligned with expectations.

> the practical constraint of a finite context window coupled with meta-in-context learning’s rapid prompt length increase

The model is not influenced by the added context. The answers are. The abstractions, or, what the LLM "knows", are in the model itself, whereas the answers are just byproducts.

There is an ongoing related discussion on LLMs lacking world model. One could say LLMs do implement a computable model, albeit a dead, static, non-editable one.