Hacker News new | ask | show | jobs
by fpesce 35 days ago
I don't agree with the "no tsunami in sight": if you don't look at 100+ bugs in Firefox and many more OSS projects, bunch of old unseen-before OpenBSD/Linux RCEs, and a few LPE in just 2 or 3 weeks for Linux itself...

IMO, this does not sound like marketing scare, there is spike of vulnerability disclosures - high quality, low false positives - that can be sensed... It feels like we're speedrunning through few-years worth of high quality bug reports in just a few weeks.

3 comments

Mythos isn’t released yet.

Anthropic noticed the trend of AI vulnerability scanning and started advertising Mythos, which is unreleased, as being very good at it.

Then they donated very large token budgets for using Mythos privately to several teams. Those teams used the free token spend for security research (that was the deal) and anything they found got attributed to Mythos, not the token budget.

Mythos looks like a good incremental model but the PR team has done a great job of associating themselves with the current trend. So much so that comments like yours already associated vulnerabilities found with this model which isn’t even available yet

Mythos hasn't been released yet, but there seems to be some evidence that GPT-5.5, which has been released, is already a touch better anyhow in some dimensions: https://www.mindstudio.ai/blog/gpt-5-5-vs-claude-mythos-cybe...

Close enough that you can probably get a good sense of Mythos' performance by using GPT-5.5.

One thing I noticed while using GPT-5.5 for this is that the ability of the model to turn the bug into an outright vulnerability is less relevant than you might intuitively think. All that is really necessary is for the model to point out that something is smelly, and you should just fix it. Turning it into a runnable exploit has very limited utility for the defender. It does turn heads and may get the attention of some otherwise reluctant people, but everything I found was obviously enough wrong that the exploit was just decorative.

An actual PoC is often very helpful in prioritizing getting the bug fixed, in demonstrating that the bug is real, and in providing something that devs can see happening in their debuggers.
The LPEs were not found with Mythos but with existing, publicly available models.
And also: they did an earlier run with Opus to discover bugs (like segfaults).

In February, Opus discovered a whole bunch of security related bugs, but didn’t exploit them.

Mythos, in turn, was fed these bugs and told to exploit them.

Not saying it’s not impressive, but it was literally told “here are all the places our metal detector says there may be gold, please find gold”.

There is a significant difference between being able to see one flaw and being able to chain together multiple disparate flaws, to be fair.
> bunch of old unseen-before OpenBSD/Linux RCEs,

AFAIK, the only thing it found in OpenBSD was a DoS?

Edit: For that matter, I'm not aware of RCEs in Linux, only LPE?

The whole thing started with a talk from Nicholas Carlini mentioning a remote 20+ year old NFS vuln IIRC.