| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GaggiX 503 days ago
	From what I see, the Deepseek R1 model seems to be better calibrated (knowing what it knows) than any other model, at least on the HLE benchmark: https://lastexam.ai/