Hacker News new | ask | show | jobs
by bpodgursky 720 days ago
ChatGPT is not trained to "escalate" an issue because there's nobody to escalate to. You can get this to happen pretty reliably via prompting, and with even light retraining basically 100%.

And here's the thing: most front-line customer service is also clueless about difficult problems. The IRS cannot pull 10,000 seasonal experts on the line, they are going to hire barely-trained part-time accountants who also flub hard questions.

1 comments

But human brains have a more developed and reliable means of expressing uncertainty, which is still a challenge for LLMs.

e.g. part-time front-line customer service will prefix a statement with "uhhh..." if they don't actually know what they're talking about, even if they do have trouble answering accurately.

> e.g. part-time front-line customer service will prefix a statement with "uhhh..." if they don't actually know what they're talking about, even if they do have trouble answering accurately

You can literally prompt GPT4 "Prefix a statement with uhhhh if you don't know what you are talking about" and get similar behavior.

That doesn't mean the 'uhhh...' is related to the certainty of the remainder of the response.

I literally just tested your prompt, with the question "is the sky blue?" and chatgpt prefixed the response with "uhhh..."

These models create the illusion of thought by statistically stringing words together, but they don't actually think or perform judgement of their own.

Edit: After digging into this for a few minutes, I challenge you to try prompting an LLM to judge the certainty of its own responses. The results I am getting are even worse than I thought it would be.

What model are you using? Here's 4o https://chatgpt.com/share/8815a841-d06b-4876-9d3e-7f5f4f1d7b....

Custom instructions: "If you aren't confident in your answer, prefix your response with "Uhhhhh". Otherwise answer the same as normal."

I was also using 4o

So... 4o is not confident that only humans qualify as dependents?

I think even a very junior front-line customer service rep should be able to answer that one confidently.

It seems that what the model is actually doing is prefixing "Uhhhh" when your question is leading in a way that doesn't match the data it has. The fact that the IRS requires dependents to humans should be answerable with an extremely high confidence, and that data is without a doubt in their dataset... but again, the model doesn't actually experience human confidence or uncertainty.

It's not confident because OA2143 is a fake form I made up.