| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tonetegeatinst 75 days ago
	Mythos's research is easily replicated via other top models and even some open source models, so this is nothing new. This is just like using Fizzers to test programs or auditing code using test cases to find potential issues and then turning that into an exploit.

1 comments

uberdavid 73 days ago

The system card directly compares to Opus 4.6 and other frontier models on the same evals. Cybench went from ~75% to 100%, Firefox exploitation from 1 bug unreliably to 4 bugs reliably. It's true there are many capable coding models out there, but the post is about why this specific cyber capability jump happened.

link