| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ZrArm 82 days ago

> Unless you have reasons to not believe the dozens of credentialed, well respected people in the field that have already shared their opinions after working with mythos.

Exactly the same argument was made about o3-preview, lol. But anyway, do they talk about all domains where Mythos did the leap in capabilities (math and other research, ML, SWE) or only about cybersec?

> And then there's the team at mozilla. They wrote a blog about this, and they've worked with anthropic before, using opus 4.6 and found and fixed 22 vulnerabilities. Then they worked with mythos and found and fixed 271 vulnerabilities

Those 22 bugs were found in February, at the time when Mozilla were doing first small-scale experiments with Opus 4.6 (i.e. no proper integration into workflow, likely relatively simple harness, likely only small part of codebase was covered). You can't compare "22 bugs which were found during very early attempts to apply AI" and "271 bugs which were found during large-scale codebase scanning with properly configured AI". The fact that Mozilla is pretty vague about "contribution of other AI models" makes it even worse.

> Unless you're going to accuse them of being shills, these are unquestionable numbers. The model is quantitatively better at this thing

They found another ~150 bugs after their first announce, and only like ~35 were found by Mythos. It's already very sharp drop in contribution.

> I think there are better things to accuse anthropic of, than that they are simply lying for marketing purposes.

Anthropic already used a lot of "technically correct but in fact deceiving" statements in Mythos system card. They are playing both "It's too dangerous" and "We don't have enough compute for that super model" at the moment (it's usually a big red falg). Opus 4.7 (which was likely supposed to be "Opus 5.0", given various facts) is a disaster from various points of views. Of course people don't really believe Anthropic.