How RL Reward Hacking Made Claude Mythos a Zero-Day Hunter

Y	Hacker News new \| ask \| show \| jobs

	How RL Reward Hacking Made Claude Mythos a Zero-Day Hunter (uberdavid.substack.com)
	2 points by uberdavid 75 days ago

1 comments

tonetegeatinst 75 days ago

Mythos's research is easily replicated via other top models and even some open source models, so this is nothing new.

This is just like using Fizzers to test programs or auditing code using test cases to find potential issues and then turning that into an exploit.

link

uberdavid 73 days ago

The system card directly compares to Opus 4.6 and other frontier models on the same evals. Cybench went from ~75% to 100%, Firefox exploitation from 1 bug unreliably to 4 bugs reliably. It's true there are many capable coding models out there, but the post is about why this specific cyber capability jump happened.

link