Hacker News new | ask | show | jobs
by embedding-shape 41 days ago
None of those other LLM tooling made the claims they're too dangerous to be released and used though, unlike Anthropic did with Mythos.

What it highlights, is that Mythos doesn't seem so much better than other LLM driven tooling at finding security issues, which was the strongest claim Anthropic made in the first place.

5 comments

People love defending Anthropics shortcomings…

“Mythos isn’t supposed to be that good at security, because actually Anthropic was referring more about running llms than mythos specifically”

“The opus model is worse because they have no compute because they are training mythos. The degraded performance is justified!”

“All the bugs in Claude code is just because the models are so good they are just looping and are shipping fast”

Constantly see people crawl out of the woodwork to defend a trillion dollars company overhyping every press release it gives

If policitians can buy online supporters to manipulate perception during their election campaigns, I'd expect private corporations would too. Of course people can become very biased on their own, but an online PR/Marketing/Influencing campaign might encourage them to be more vocal.
It's silly to act like they've got mud on their face when Mythos and Opus are apparently some of the very best models. Anyone that has found value out of previous LLMs is likely to find more value out of the newest ones. The only thing Mythos looks bad against is the very tall bar some people have imagined. People are putting too much weight on marketing and then reaction to marketing.
> People are putting too much weight on marketing and then reaction to marketing.

No, what others are doing, which I've done myself in the past too, is to evaluate how much their marketing matches up with reality, then share our experience about that. Very different than just "putting too much weight on marketing".

It's important to keep in mind that very, very few projects are as rigorously tested as curl, so while it's interesting to hear this feedback I think curl would be a torture test for any security scanning. I'd be more interested to hear about other random libraries that aren't as thoroughly analyzed as curl; show me some results for GnuTLS, for example, or dpkg/rpm/apt/dnf/pacman/etc.
I think one of the points of TFA was that other AI tools found many vulnerabilities; after having fixed those, mythos did find another vulnerability the others missed, but that seems to imply this model is only marginally better than the competition instead of being on a different league altogether like it's marketed. Paraphrasing the author: sure mythos will find lots of security issues in gnutls, but so will gpt or opus (they acknowledge explicitly that all those tools are getting very good).
Actually, OpenAI made a similar claim about one of their GPT models a while ago…

Funnily enough that was while Dario Amodei was their research director.

If you're referring to gpt-2 in 2019, that primarily about concerns with it being used by spammers and fake content generators. In retrospect, that was a totally valid concern.
They had a reddit with GPT2 back and forth I have to say I got suckered into a conversation before I figured it out -- it was definitely the OG Moltbook of non sequiturs
> None of those other LLM tooling made the claims they're too dangerous to be released and used though, unlike Anthropic did with Mythos.

I do think they've said similar things in the past, but regardless Anthropic's BS marketing is something to behold and viewing it with extreme skepticism is smart.

> What it highlights, is that Mythos doesn't seem so much better than other LLM driven tooling at finding security issues, which was the strongest claim Anthropic made in the first place.

That's the conclusion Daniel makes and it definitely seems plausible, his opinion absolutely carries a lot of weight with me for sure.

But I hedge a little because we don't really know how much human labor was required to supplement those earlier LLM-assisted reviews of curl, nor do we know how easy it was for the person who used Mythos to generate the new batch. So the kind of bug hunting that might be "possible but still labor intensive" via current tooling might be far easier to accomplish with less skilled developers using Mythos.

And who knows, maybe Mythos is better on worse codebases, curl benefits from being very good to start from :)

Too dangerous to be released, right after the Department of Defense* dropped them