Would have like to see analysis against curl repo where the commit level is one day after the Mythos training data cutoff. And disable access to the internet.
Mythos is either dangerous or not. We are taking dangerous to mean that the number of vulns it finds will be much greater than bugs found with available tools.
Since mythos found only one additional vuln, and since x+1 is not much greater than x, it follows that mythos is not dangerous per the definition above.
It doesn’t follow because the results for curl don’t necessarily generalize to other codebases. It’s evidence against Mythos being particularly dangerous, but it’s just one datapoint.
It doesn’t invalidate the other security bugs Mythos allegedly found in other codebases.
I don't think I understand what you mean, the "not particularly dangerous" comment was in relation to the vulnerability that was found right ? Surely they would know what constitutes a lower severity level.
My guess is that it is in category of "you are holding it wrong". Still worth fixing, but requires very specific user input for example. Or very weird scenario. Or in some less used protocol or flag combination.
Sure, but isn't it a verdict on Mythos compared to other models?
If so, it would still follow. "Most software" isn't analyzed as much as curl, by either other tooling or other models, that might well find close to the same as Mythos did. As such, Mythos then isn't especially/particularly dangerous.
Curl is currently receiving a record number of high-quality bug/vuln reports (a rather sharp change from the earlier slop inundation), so it’s not like there’s nothing to find. Many or most of these are presumably found by human experts assisted by AI tools, but if Mythos were truly revolutionary, it should be able to find such issues on its own.
> I did a quick unscientific poll on Mastodon to see if other Open Source projects see the same trends and man, do they! Friends from the following projects confirmed that they too see this trend. Of course the exact numbers and volumes vary, but it shows its not unique to any specific project.