|
|
|
|
|
by lordofgibbons
69 days ago
|
|
Without showing false-positive rates this analysis is useless. If your model says every line if your code has a bug, it will catch 100% of the bugs, but it's not useful at all. They tested false-positives with only a single bug... I'm not defending anthropic and openai either. Their numbers are garbage too since they don't produce false-positive rates either. Why is this "analysis" making the rounds? |
|
Anyway, it seems like they erred in the up-front claim "small models found the vulnerability we pointed directly at!", but the findings are at least somewhat stronger if you read through the details.
The small models didn't match Mythos at exploitation. They suggested plausible exploits, but didn't actually try them out so I can't tell if they would have worked. Deepseek R1's sounds pretty convincing to me, but I'm not a good judge. (I'm more in the space of accidentally writing vulnerabilities, not seeking them out or exploiting them. Well, ok, I have a static analysis that finds some, at least.)