|
|
|
|
|
by andai
3 hours ago
|
|
> Anthropic's headline cyber evaluations mostly measure offensive progress (exploits, PoCs, challenges); our benchmark tests whether a model can actually generate safe code, and there Fable 5 did not stand out. The model isn't allowed to think about security. I heard several people here mention that if it starts thinking about security -- e.g. writing tests related to it -- the safety filter flags it and downgrades to Opus. So it's actually not allowed to make your code secure. |
|
Model is definitely better than Opus but Anthropic's delivering a pretty terrible experience.