| HN Mirror

The model is trained to, essentially, fabulate an excuse in response to correction; which also gets to a major limitation: it is not learning truth from falsehood but rather learning what human evaluators like or dislike.

"ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows."

https://openai.com/blog/chatgpt/