Hacker News new | ask | show | jobs
by simonw 20 days ago
But the prompt didn't give the models the option to say "I don't know", so it wasn't a measure of their confidence.
1 comments

I mean that's true but I don't think that's realistically what's going on when one model gives an unqualified "Yes" and the other gives an unqualified "no."

You can argue the study isn't as case-closed-decisive as we'd ideally like, but it's certainly evidence. It's probably hard to design a better study.