|
|
|
|
|
by uberdavid
73 days ago
|
|
The system card directly compares to Opus 4.6 and other frontier models on the same evals. Cybench went from ~75% to 100%, Firefox exploitation from 1 bug unreliably to 4 bugs reliably. It's true there are many capable coding models out there, but the post is about why this specific cyber capability jump happened. |
|