Hacker News new | ask | show | jobs
by JeremyNT 44 days ago
This is roughly what I was assuming but of course the big caveat here is that they were already using the existing LLM driven tooling on an extensively audited codebase.

So while anthropic's marketing may be hype there just wasn't much left to find, a point he makes in the blog post.

Whether it's a big step forward for other kinds of projects is difficult to tell, but this highlights that everybody should be using AI code review tools to audit their existing code today, and not everybody is.

2 comments

None of those other LLM tooling made the claims they're too dangerous to be released and used though, unlike Anthropic did with Mythos.

What it highlights, is that Mythos doesn't seem so much better than other LLM driven tooling at finding security issues, which was the strongest claim Anthropic made in the first place.

People love defending Anthropics shortcomings…

“Mythos isn’t supposed to be that good at security, because actually Anthropic was referring more about running llms than mythos specifically”

“The opus model is worse because they have no compute because they are training mythos. The degraded performance is justified!”

“All the bugs in Claude code is just because the models are so good they are just looping and are shipping fast”

Constantly see people crawl out of the woodwork to defend a trillion dollars company overhyping every press release it gives

If policitians can buy online supporters to manipulate perception during their election campaigns, I'd expect private corporations would too. Of course people can become very biased on their own, but an online PR/Marketing/Influencing campaign might encourage them to be more vocal.
It's silly to act like they've got mud on their face when Mythos and Opus are apparently some of the very best models. Anyone that has found value out of previous LLMs is likely to find more value out of the newest ones. The only thing Mythos looks bad against is the very tall bar some people have imagined. People are putting too much weight on marketing and then reaction to marketing.
> People are putting too much weight on marketing and then reaction to marketing.

No, what others are doing, which I've done myself in the past too, is to evaluate how much their marketing matches up with reality, then share our experience about that. Very different than just "putting too much weight on marketing".

It's important to keep in mind that very, very few projects are as rigorously tested as curl, so while it's interesting to hear this feedback I think curl would be a torture test for any security scanning. I'd be more interested to hear about other random libraries that aren't as thoroughly analyzed as curl; show me some results for GnuTLS, for example, or dpkg/rpm/apt/dnf/pacman/etc.
I think one of the points of TFA was that other AI tools found many vulnerabilities; after having fixed those, mythos did find another vulnerability the others missed, but that seems to imply this model is only marginally better than the competition instead of being on a different league altogether like it's marketed. Paraphrasing the author: sure mythos will find lots of security issues in gnutls, but so will gpt or opus (they acknowledge explicitly that all those tools are getting very good).
Actually, OpenAI made a similar claim about one of their GPT models a while ago…

Funnily enough that was while Dario Amodei was their research director.

If you're referring to gpt-2 in 2019, that primarily about concerns with it being used by spammers and fake content generators. In retrospect, that was a totally valid concern.
They had a reddit with GPT2 back and forth I have to say I got suckered into a conversation before I figured it out -- it was definitely the OG Moltbook of non sequiturs
> None of those other LLM tooling made the claims they're too dangerous to be released and used though, unlike Anthropic did with Mythos.

I do think they've said similar things in the past, but regardless Anthropic's BS marketing is something to behold and viewing it with extreme skepticism is smart.

> What it highlights, is that Mythos doesn't seem so much better than other LLM driven tooling at finding security issues, which was the strongest claim Anthropic made in the first place.

That's the conclusion Daniel makes and it definitely seems plausible, his opinion absolutely carries a lot of weight with me for sure.

But I hedge a little because we don't really know how much human labor was required to supplement those earlier LLM-assisted reviews of curl, nor do we know how easy it was for the person who used Mythos to generate the new batch. So the kind of bug hunting that might be "possible but still labor intensive" via current tooling might be far easier to accomplish with less skilled developers using Mythos.

And who knows, maybe Mythos is better on worse codebases, curl benefits from being very good to start from :)

Too dangerous to be released, right after the Department of Defense* dropped them
Everyone should be using exclusively a proof assistant (Lean/Agda/Rocq/Isabelle) and proving their code correct, but they're not.

Do you see how ridiculous the zealotry sounds when its not your personal kind of zealotry?