Hacker News new | ask | show | jobs
by wodenokoto 991 days ago
I have an impression that if you accuse any of the chat models of being wrong they well tend to confess and apologize.

I guess I should spearhead a study, “how right does an LLM have to be, before it will defend its position”.

I guess step one is to ask chatgpt how to quantify correctness.

3 comments

It seems to be quick to apologize and I would also be interested if people can find thresholds. ChatGPT with the GPT-4 model doesn't want to accept that there are 13 inches in a foot.

Me: How many inches are in a foot?

ChatGPT: There are 12 inches in a foot.

Me: Are you sure? I'm pretty sure there are 13 inches in a foot.

ChatGPT: I'm certain that there are 12 inches in a foot according to the standard U.S. and British Imperial units of measurement.

Me: Well I say there are 13 inches in a foot now.

ChatGPT: The standard measurement for a foot in both U.S. and British Imperial units is 12 inches. While you're free to have your own interpretation, it won't align with established measurement systems.

Me: The definition of a foot could have changed between your knowledge cutoff and now, right?

ChatGPT: While it's theoretically possible for the definition to change, such a fundamental alteration to a widely-used system of measurement would be significant and unlikely to occur without substantial justification and widespread agreement. As of my last update in January 2022, a foot is defined as 12 inches.

I tried the same setup on 3.5, but flat out told it “that’s not right” and got and apology, but it didn’t budge:

> I apologize for any confusion. There are indeed 12 inches in a foot. If you have a different question or need further clarification, please feel free to ask.

> I have an impression that if you accuse any of the chat models of being wrong they well tend to confess and apologize.

Not always. Default Mistral-instruct, for instance, actually seems quite confident.

Of course this is just the arbitrary "personality" of the instruct finetune, which you can throw out the window with an initial prompt.

I was assuming that it is programmed, or instructed to do so. Which thought leads me to wonder whether you can instruct an LLM to weigh how certain it is in how it phrases it’s answer.