Hacker News new | ask | show | jobs
by GaggiX 503 days ago
From what I see, the Deepseek R1 model seems to be better calibrated (knowing what it knows) than any other model, at least on the HLE benchmark: https://lastexam.ai/