Hacker News new | ask | show | jobs
by kube-system 719 days ago
I was also using 4o

So... 4o is not confident that only humans qualify as dependents?

I think even a very junior front-line customer service rep should be able to answer that one confidently.

It seems that what the model is actually doing is prefixing "Uhhhh" when your question is leading in a way that doesn't match the data it has. The fact that the IRS requires dependents to humans should be answerable with an extremely high confidence, and that data is without a doubt in their dataset... but again, the model doesn't actually experience human confidence or uncertainty.

1 comments

It's not confident because OA2143 is a fake form I made up.
Which is another thing that a front-line worker would easily be able to answer.

https://www.irs.gov/forms-pubs-search?search=OA2143

Ultimately, the tax question you asked it is something simple for a front-line worker to answer. So either one of two things must be true:

* either GPT-4o is so bad at answering tax questions that it cannot even answer easy ones confidently

* or GPT-4o is so bad at determining its own confidence level that it doesn't know when it is able to definitively answer even an easy question.

Either situation makes it bad for this task.

As I mentioned above, humans are good for answering questions even when they don't know the answer, because they're good at expressing their confidence to other humans. In this case, you'd want the support agent to answer definitively that animals do not qualify as dependents. One could certainly make their chat bot answer unconfidently randomly, or in response to strange questions, or all the time, but then the confidence signal isn't actually providing social value of communicating certainty.