| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zug_zug 20 days ago
	Well then it shows that these models are using widely disparate training sets and have high confidence even when they shouldn't. Questions like "is mouthwash effective" presumably has one solid data source -- medical journals.

2 comments

simonw 20 days ago

But the prompt didn't give the models the option to say "I don't know", so it wasn't a measure of their confidence.

link

zug_zug 19 days ago

I mean that's true but I don't think that's realistically what's going on when one model gives an unqualified "Yes" and the other gives an unqualified "no."

You can argue the study isn't as case-closed-decisive as we'd ideally like, but it's certainly evidence. It's probably hard to design a better study.

link

TaupeRanger 20 days ago

What are you talking about? The models were not ALLOWED to have confidence (or the lack thereof). They were explicitly told to give a single label, and in most cases, all of them were correct depending on additional context they would surely have provided, especially with access to the internet (which some didn't have). This is just silly.

link