Hacker News new | ask | show | jobs
by jarulraj 978 days ago
As we do not have ground truth, we only qualitatively checked for accuracy -- no quantitative metrics. We did find a significant drop in accuracy with GPT 3.5 as opposed to GPT 4.

Are you measuring accuracy with data wrangling prompts? Would love to learn more about that.

1 comments

Everything I do now is classification and AUC-ROC is my metric. For your problem my first thought is an up-down accuracy metric, but the tricky problem you might have is "do you accept both 'United States' and 'USA' as a correct answer?" and the trouble dealing with that is one reason I stick to classification problems.

I'm skeptical of any claim that "A works better than B" without some numbers to back it up.