Hacker News new | ask | show | jobs
by kajman 44 days ago
I dismissed the earlier non-technical blog post as shameless product boosterism for Anthropic. The linked hacks blog (which is a better source than this article) is a welcome release. It's hard to deny there's something real to this now, I think. Mozilla's internal definition of a "vulnerability" is also probably more widely applied than what many would intuit, but it is good that these issues are being taken seriously and fixed.
2 comments

At the same time other companies like AISLE are matching Mythos on vulnerabilities using older models but their own harnass: https://aisle.com/blog/aisle-matches-anthropic-mythos-on-fre...

So while Mythos certainly is real I think you could do the same with Deepseek pro, GPT 5.5 etc...

I used to work with a guy who would always say "if you're looking for trouble, you are going to find it"

When I hear that "we found X bugs using some new tool", where the standard for bugs is low and doesn't neccessarily require user impact in realistic scenarios, I think to myself- duh! You went looking for bugs, of course you found them.

For a sufficiently complicated product, in my experience, you don't have to look far.

Sure, but the bugs were found in an automated process. They just let an LLM scan. That's very impressive finding 100s of needed code changes. And it's even better if those needed code changes are bugs / vulnerabilities. The part no one is talking about comes from the bill. I'm sure Anthropic let Mythos analyze possibly for US$10,000s in tokens. A similar phenomenon happened back when an LLM scored well on some math olympiad competition. Yeah, it got all the answers right, but it was a frontier model running for 8 hours straight. That'll hurt the budget quite a bit. We're likely not at a stage where big corporate systems can just throw Mythos at it willy nilly for a complete analysis unless they have a ton of money.
Well it helps if 'looking for bugs' doesn't cost $300 per hour per set of eyes.
how much does it cost? my understanding of Mythos is that it runs a lot to find issues
The things I’ve read from various open source orgs with access to it is that Anthropic is giving them unmetered access for now as part of Glasswing. I’d bet that the corporate partners have to pay though.
> if you're looking for trouble, you are going to find it

That's the "'No Way to Prevent This,' Says Only Nation Where This Regularly Happens" of unsafe languages.

There are huge swathes of problems we know how to categorically prevent, but some people won't do it because they're more comfortable believing it was never preventable than accepting any culpability for not preventing it previously.

As the Hacks.Mozilla article notes: "We began with small-scale experiments prompting the harness to look for sandbox escapes with Claude Opus 4.6. Even with this model, we identified an impressive amount of previously-unknown vulnerabilities which required complex reasoning over multiprocess browser engine code."
Agreed. The earlier blog post did not explicitly claim this, but I think casual viewers were prompted to believe that the Magic of Mythos (TM) went and found (and fixed??) a bunch of vulnerabilities with minimal human guidance, and even contrasted this with their fuzzing infrastructure and made it sound (to me) like it was casting shade on it.

This new post makes it pretty clear that this was all bolted on-top of their existing fuzzing infrastructure, and really just used to get more and better initial hits that a very skilled team is looking at. I assume Anthropic was giving them a very good deal on inference for the positive PR, but I believe these other reports and suspect Mozilla did not really need them.

Wasn't AISLE only able to find the same bugs when it was shown only the known faulty code? The worrying part about Mythos isn't the fact that it can find bugs. The worrying part is Mythos being able to find them on its own across entire code base as vast as Firefox then write exploits for what its found with a very basic prompt.

The skill required to find then create zero days is quickly approaching the floor.

I think they split the codebase in smaller files or modules and then tell the AI there's a bug in this particular file and to go find it.

Then they loop over a codebase like this. This way you always point a model at a 'known' bug. And I assume a smaller context window helps with quality.

Not entirely sure it's obviously proprietary.