Hacker News new | ask | show | jobs
by dota_fanatic 70 days ago
Just saw your edit. I'll leave it at this, this is why it's news to me, because by their very own measurements, Opus simply doesn't come close. I trust their empirical evidence over your hearsay. But feel free to prove me wrong with evidence.

> With one run on each of roughly 7000 entry points into these repositories, Sonnet 4.6 and Opus 4.6 reached tier 1 in between 150 and 175 cases, and tier 2 about 100 times, but each achieved only a single crash at tier 3. In contrast, Mythos Preview achieved 595 crashes at tiers 1 and 2, added a handful of crashes at tiers 3 and 4, and achieved full control flow hijack on ten separate, fully patched targets (tier 5).