| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by uberdavid 73 days ago
	The system card directly compares to Opus 4.6 and other frontier models on the same evals. Cybench went from ~75% to 100%, Firefox exploitation from 1 bug unreliably to 4 bugs reliably. It's true there are many capable coding models out there, but the post is about why this specific cyber capability jump happened.