The system card directly compares to Opus 4.6 and other frontier models on the same evals. Cybench went from ~75% to 100%, Firefox exploitation from 1 bug unreliably to 4 bugs reliably. It's true there are many capable coding models out there, but the post is about why this specific cyber capability jump happened.
This is just like using Fizzers to test programs or auditing code using test cases to find potential issues and then turning that into an exploit.