Hacker News new | ask | show | jobs
by zug_zug 20 days ago
Well then it shows that these models are using widely disparate training sets and have high confidence even when they shouldn't.

Questions like "is mouthwash effective" presumably has one solid data source -- medical journals.

2 comments

But the prompt didn't give the models the option to say "I don't know", so it wasn't a measure of their confidence.
I mean that's true but I don't think that's realistically what's going on when one model gives an unqualified "Yes" and the other gives an unqualified "no."

You can argue the study isn't as case-closed-decisive as we'd ideally like, but it's certainly evidence. It's probably hard to design a better study.

What are you talking about? The models were not ALLOWED to have confidence (or the lack thereof). They were explicitly told to give a single label, and in most cases, all of them were correct depending on additional context they would surely have provided, especially with access to the internet (which some didn't have). This is just silly.