Hacker News new | ask | show | jobs
by alfalfasprout 720 days ago
I can't think of a worse use for an LLM...
1 comments

You really under-estimate how googleable 97% of customer service calls are. The average person does not make any attempt to solve their own problems before calling customer support. That's just life.

Yes in an ideal world we would have a live customer support representative for every function in every facet of society, but there are a limited number of human beings available for such things, and this is a pretty reasonable place to do a first triage using a LLM for very simple questions.

One of the most observed weaknesses of LLMs is that they have no clue when they're dealing with a difficult problem. There's no doubt that throwing an LLM at the problem would likely fix many simple issues. The question is whether or not it can accurately triage a difficult issue, which is a task they tend to struggle with.

When accuracy matters, answering a question incorrectly puts a person in an even worse situation than simply failing to answer the question.

ChatGPT is not trained to "escalate" an issue because there's nobody to escalate to. You can get this to happen pretty reliably via prompting, and with even light retraining basically 100%.

And here's the thing: most front-line customer service is also clueless about difficult problems. The IRS cannot pull 10,000 seasonal experts on the line, they are going to hire barely-trained part-time accountants who also flub hard questions.

But human brains have a more developed and reliable means of expressing uncertainty, which is still a challenge for LLMs.

e.g. part-time front-line customer service will prefix a statement with "uhhh..." if they don't actually know what they're talking about, even if they do have trouble answering accurately.

> e.g. part-time front-line customer service will prefix a statement with "uhhh..." if they don't actually know what they're talking about, even if they do have trouble answering accurately

You can literally prompt GPT4 "Prefix a statement with uhhhh if you don't know what you are talking about" and get similar behavior.

That doesn't mean the 'uhhh...' is related to the certainty of the remainder of the response.

I literally just tested your prompt, with the question "is the sky blue?" and chatgpt prefixed the response with "uhhh..."

These models create the illusion of thought by statistically stringing words together, but they don't actually think or perform judgement of their own.

Edit: After digging into this for a few minutes, I challenge you to try prompting an LLM to judge the certainty of its own responses. The results I am getting are even worse than I thought it would be.

It's one of the reasons why I stopped joining facebook groups. Every day the same ^%$#^#%$ post by a [adjective] [derogatory term] who couldn't be bothered to use Google / Bing / ect.