Hacker News new | ask | show | jobs
by yanis_t 15 days ago
Is there any evidence Mythos is qualitatively better than the Opus 4.x?

I'm afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.

Is this why both Anthropic and OpenAI are rushing for IPOs this year?

4 comments

It is quantitatively better at finding and exploiting vulnerabilities. Pretty wild that everyone here is just in denial about that, when folks who have used it say it's as good as the hype

Cf wrote a genuinely good piece and had found a bunch of bugs: https://blog.cloudflare.com/cyber-frontier-models/

Wolfssl is security focused and it found a novel exploit https://www.wolfssl.com/how-claude-mythos-preview-helped-har...

You can pretend that it's all smoke and mirrors, but that just doesn't match up with reality: https://www.paloaltonetworks.com/blog/2026/05/defenders-guid...

From what I've read so far it's less about Mythos being much better at tasks in isolation.

Security wise, it's about being able to find and chain multiple vulnerabilities to actually create viable exploits.

So I would imagine that if you were using it for regular software development you may not feel that it's that different unless used in a particular way?

There seems to be some number of people here on HN that make their money in old style cyber security that seem to be under the delusion that LLMs are just going to go away and it's going to go back to business as usual for their cash cow.

I work with a number of people in security that have come around, and while they still think LLMs are rather garbage at architecture, they see how well current models we can access now are at finding security issues. They can chain together wildly different concepts and turn them into working exploits.

> Im afraid that the usual mantra that "we just need more scale" that worked well for attracting investments, is not working anymore - bigger models provide marginal improvements while naturally get much more expensive to run.

It's super interesting to hear this refrain on HN, it is alarmingly common. Anthropic released benchmark numbers on Mythos, as they have for all of their models. Once models become public, people evaluate them in a myriad of ways. We have had reliable scaling laws for years and they still hold. Epoch capability index continues to grow exactly as expected. Where does this idea come from?

As for cost, the cost per token at a given level of performance drops up to 40x per year.

Mythos numbers are effectively irreproducible aside from cherry-picked approvals.
Yes absolutely. However the benchmarks they did release numbers for are key ones and the leaps were very large. Absolutely possible that they either lied or that the full picture is much muddier but based on the numbers they do show that’s hard for me to imagine a likely scenario that would produce that.
It probably isn't, at least in terms of security or memory safety. The current models can already sniff out all memory vulnerabilities with relative ease, you can't really beat that.
Have you read firefoxes findings? They found it to be qualitatively improved over Opus, and have published several of the resulting CVEs as well as more detailed numbers.
They also seem to point to it being more the harness than the model itself.
Really? They mention that Opus 4.7 in the same harness found like 1% of the bugs that Mythos found.