Ultimately, the tax question you asked it is something simple for a front-line worker to answer. So either one of two things must be true:
* either GPT-4o is so bad at answering tax questions that it cannot even answer easy ones confidently
* or GPT-4o is so bad at determining its own confidence level that it doesn't know when it is able to definitively answer even an easy question.
Either situation makes it bad for this task.
As I mentioned above, humans are good for answering questions even when they don't know the answer, because they're good at expressing their confidence to other humans. In this case, you'd want the support agent to answer definitively that animals do not qualify as dependents. One could certainly make their chat bot answer unconfidently randomly, or in response to strange questions, or all the time, but then the confidence signal isn't actually providing social value of communicating certainty.
https://www.irs.gov/forms-pubs-search?search=OA2143
Ultimately, the tax question you asked it is something simple for a front-line worker to answer. So either one of two things must be true:
* either GPT-4o is so bad at answering tax questions that it cannot even answer easy ones confidently
* or GPT-4o is so bad at determining its own confidence level that it doesn't know when it is able to definitively answer even an easy question.
Either situation makes it bad for this task.
As I mentioned above, humans are good for answering questions even when they don't know the answer, because they're good at expressing their confidence to other humans. In this case, you'd want the support agent to answer definitively that animals do not qualify as dependents. One could certainly make their chat bot answer unconfidently randomly, or in response to strange questions, or all the time, but then the confidence signal isn't actually providing social value of communicating certainty.