Hacker News new | ask | show | jobs
by satisfice 97 days ago
I call this self-repudiation. I performed a systematic experiment on this exact matter, a couple of years ago. I found that ChatGPT 3.5 frequently self-repudiated, whereas 4.0, under identical circumstances, rarely did.

These experiments are a bit expensive to run because you are forced to read all the responses to judge repudiation. Sometimes it is subtle.

Also, behavior changes with the exact wording of the question.