Hacker News new | ask | show | jobs
by killerdhmo 1284 days ago
You’d think it would be smart enough to know that for this particular question, the details of the answers have not changed since 2021.
1 comments

The model is trained to, essentially, fabulate an excuse in response to correction; which also gets to a major limitation: it is not learning truth from falsehood but rather learning what human evaluators like or dislike.

"ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows."

https://openai.com/blog/chatgpt/