Hacker News new | ask | show | jobs
by NitpickLawyer 41 days ago
What's going on in this thread? It's weird how prevalent the negativity towards mythos is, and I'm not sure if it's people throwing the baby out with the bathwater or something more tinfoil-adjacent coordinated campaign. I also noticed this on a thread a few days ago, before the mozilla post. There were dozens of comments saying basically "mythos is vaporware".

I get the idea that they're using it for marketing. Of course they are. But to reduce it at "just marketing" feels either ill informed or outright wrong. Unless you have reasons to not believe the dozens of credentialed, well respected people in the field that have already shared their opinions after working with mythos. Plenty of them on all the social media sites.

And then there's the team at mozilla. They wrote a blog about this, and they've worked with anthropic before, using opus 4.6 and found and fixed 22 vulnerabilities. Then they worked with mythos and found and fixed 271 vulnerabilities. Unless you're going to accuse them of being shills, these are unquestionable numbers. The model is quantitatively better at this thing. And it matches what everyone is saying.

I think there are better things to accuse anthropic of, than that they are simply lying for marketing purposes. Of course they'll use this as a marketing campaign, but there's plenty of evidence out there that there is something there, that the model is simply better than previous generations at this. Don't fall for the cheap reductionist stuff, just because you don't like them, or feel that this is marketing fluff. It doesn't feel like a gimmick, even if it gets used to push their agenda. Something, something, propaganda often uses true statements as well.

4 comments

Here and on reddit, AI debugging is viewed as some weird shallow pattern-matching that obviously fails to spot real stuff and overload the maintainers. Instead of getting to "spotless record" of zero flaws, the people start rationalizing that "X is not a real bug" and inventing justifications for their(obviously bad) code, which is critique they can't accept from AI, only through human debate that they can't close with a WONTFIX. Once the bug is actually usable, the tune changes completely.
> Here and on reddit, AI debugging is viewed as some weird shallow pattern-matching that obviously fails to spot real stuff and overload the maintainers.

That's because that is what a lot of people did in the last years [1] to pad their resumes or to force developers to backport patches to older (but supported) kernel versions that wouldn't have gone in if they didn't have a CVE attached [2]. Maintainers have been legitimately swamped with low-quality spam for a very long time. Only recently, in the last few months, AI actually got "good enough", the problem is that maintainers still have to differentiate between AI slop by wannabes and by AI-assisted reports reviewed and refined by actual human professionals.

[1] https://www.zdnet.com/article/how-fake-security-reports-are-...

[2] https://opensourcewatch.beehiiv.com/p/linux-gets-cve-securit...

At the end of the day attackers don't give a fuck. "Waaa waaa, AI was bad 6 months ago so I'm going to throw a little fit" doesn't work when it's currently actively exploiting your shit. No one gives a damn if there are 4000 bullshit security PRs lined up. The one real RCE in there mean that everything you hold dear has already been carted off by nation states, and probably rediscovered by 3 or 4 other exploitation groups by this point.

It's time for all the little snowflake software writers to pull up their pantaloons and realize that Linus' vision has become real. With enough AIs all security bugs become shallow. And that software affects the real word, real money, and real people in it. That they are also under attack by well financed groups with rather evil motivations. If I'm attacking some group using your software (such as another nation) I'm going to flood the fuck out of your PR system till you give up hope and die. I'm going to make you attack your contributors. I'm going to sow confusion so I have the maximum amount of time to lay waste to my enemies and profit to the max.

The internet is hostile. Software is hostile. There are sharks looking to eat you.

Time to face that fact.

> And then there's the team at mozilla

And then there’s the team at curl. Don’t fall for the cheap marketing stuff just because you like them

Everything points to Mythos being marginally better and nobody being able to afford to run it.

Many can afford it, many companies are easily spending 15-30K a month in tokens per staff.
> Unless you have reasons to not believe the dozens of credentialed, well respected people in the field that have already shared their opinions after working with mythos.

Exactly the same argument was made about o3-preview, lol. But anyway, do they talk about all domains where Mythos did the leap in capabilities (math and other research, ML, SWE) or only about cybersec?

> And then there's the team at mozilla. They wrote a blog about this, and they've worked with anthropic before, using opus 4.6 and found and fixed 22 vulnerabilities. Then they worked with mythos and found and fixed 271 vulnerabilities

Those 22 bugs were found in February, at the time when Mozilla were doing first small-scale experiments with Opus 4.6 (i.e. no proper integration into workflow, likely relatively simple harness, likely only small part of codebase was covered). You can't compare "22 bugs which were found during very early attempts to apply AI" and "271 bugs which were found during large-scale codebase scanning with properly configured AI". The fact that Mozilla is pretty vague about "contribution of other AI models" makes it even worse.

> Unless you're going to accuse them of being shills, these are unquestionable numbers. The model is quantitatively better at this thing

They found another ~150 bugs after their first announce, and only like ~35 were found by Mythos. It's already very sharp drop in contribution.

> I think there are better things to accuse anthropic of, than that they are simply lying for marketing purposes.

Anthropic already used a lot of "technically correct but in fact deceiving" statements in Mythos system card. They are playing both "It's too dangerous" and "We don't have enough compute for that super model" at the moment (it's usually a big red falg). Opus 4.7 (which was likely supposed to be "Opus 5.0", given various facts) is a disaster from various points of views. Of course people don't really believe Anthropic.

HN readers really like the idea of Daniel, he's sticking it to the man... by doing free publicity and work for the man haha