|
|
|
|
|
by SwellJoe
14 days ago
|
|
I've been doing benchmarking of various models for finding hard security bugs, and my faith in Haiku (and Sonnet, even) has dropped precipitously in the process. Self-hosted Qwen 3.6 27B consistently outperforms both for finding security bugs, which was a shocking result. I expected Qwen to be around Haiku level, maybe a little worse, and I definitely expected it to be worse than Sonnet. And, DeepSeek and MiMo perform much better than Haiku and Sonnet, near Opus/GPT 5.5 levels, at a fraction of the cost. There's seemingly no reason to ever use Haiku or Sonnet, if you're not getting it for free or as part of a subscription (that you don't usually saturate). |
|