Y
Hacker News
new
|
ask
|
show
|
jobs
by
GaggiX
503 days ago
From what I see, the Deepseek R1 model seems to be better calibrated (knowing what it knows) than any other model, at least on the HLE benchmark:
https://lastexam.ai/