| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by saithound 36 days ago

It's pretty clear at this point that Mythos' capability to discover and exploit zero-day vulnerabilities at scale is but an incremental improvement over existing models like the ones available to OpenAI's Plus/Pro subscribers.

Anthropic tries to create marketing hype around Mythos using two psychological tricks.

1. Put large numbers in the headlines.

"Mythos discovered 271 vulnerabilities in Firefox" makes the model seem extremely capable to the uninitiated.

But it's actually meaningless as a measure of capability _improvement_.

Anthropic gave away $100mil specifically as Mythos credits to these projects and companies (that's $2.5mil per project). Spending the same exorbitant amount of compute analyzing the same codebases in an older model like GPT 5.x Pro would have turned up 260 of these vulnerabilities, or could even have turned up more than 271 ones.

No need to speculate, since this is exactly what we saw in the few code bases where we have such comparisons (like in the curl codebase). Supposedly weaker models, working with a much lower budget, turned up dozens of vulnerabilities. Mythos turned up only one, which ended up as a low severity CVE.

2. Do the whole "too dangerous to release" shtick. This is one of Dario Amodei's favorite moves. When he was vice president of research at OpenAI, he declared GPT-3 (which wasn't able to produce coherent text beyond 3-4 sentences at the time) too dangerous [1] as well.

Long story short, it's the ChatGPT 4.5 situation again: a company trained a model that's too slow and expensive, but not much more capable than what came before. It therefore requires these marketing stunts.

[1] https://www.itpro.com/technology/artificial-intelligence-ai/...

10 comments

IX-103 36 days ago

I work for a company that has been using Mythos for vulnerability detection in our software. The results we're getting are revolutionary to the point that our software security teams are heavily overloaded addressing the deluge of thousands of real bugs/vulnerabilities and design flaws across our billions of lines of code.

For comparison, we are invested heavily the the AI space to the point where Anthropic is one of our competitors. We were already using state of the art models to find flaws in our code, but Mythos was just so much better at finding real vulnerabilities it's not even funny.

thrawa8387336 36 days ago

Read the above comment again. Both your comments and his/hers are compatible

anon84873628 36 days ago

They are directly contradicting the claim that if you ran other models on the same codebases you would get similar results.

zelda420 36 days ago

Yeah I’m a security researcher and my colleagues who have access say it’s insanely good… but interestingly they also work for places like nvidia which have a deep vested interest selling tokens and hardware. So of course they are pushing this narrative.

The_Blade 36 days ago

if you are invested heavily in the AI space, isn't it in your best interest for the froth around Mythos to be true and the comment you are responding to to be invalid? even if you are competing with Anthropic, a rising tide raises all ships

i'd like to see more facts and data one way or another!

anon84873628 36 days ago

This is the "circumstantial" version of the ad hominem fallacy. Just because the author may benefit from the argument being true doesn't mean it is invalid.

They are clearly disputing the assertion the Mythos is an incremental gain rather than quantum leap. Of course objective unbiased data would be nice, but these anecdotes are all we have right now.

bob1029 36 days ago

> billions of lines of code.

Billions as in 10^9?

foundart 36 days ago

https://research.google/pubs/why-google-stores-billions-of-l...

jcims 36 days ago

>Do the whole "too dangerous to release" shtick.

One aspect that isn't really discussed much in this context is how to wrap one's head around the corporate risk with models of ever increasing capability. It might not be too dangerous to society, but it could be too dangerous to Anthropic.

kilroy123 36 days ago

I couldn't agree more. I think the recent moves to partner with xAI and Amazon are proof that they desperately need more compute and are doing everything possible to get it.

MattRix 36 days ago

I mean everyone knows they need more compute. That’s not a secret or up for debate at all. They are maybe the fastest growing company in history.

fwipsy 36 days ago

I'm fairly certain Amodei believes the "too dangerous to release" hype himself. Even if it's just an incremental improvement, better than getting frog-boiled by repeated 20% improvements until someone builds bioweapons in their backyard.

drakythe 36 days ago

He's made so many statements that fall under the "boy who cried wolf" category that even if he _does_ believe these statements he needs to be managed better. I'll never forget Anthropic's huge "Oh my God, the AI blackmailed a researcher to save itself!" and the prompt effectively told the AI to do that and gave it forged emails with easy blackmail targets, as if this isn't a common trope in mystery or suspense books/television/fanfiction, all of which Claude (and others) have been trained on.

ctoth 36 days ago

It's a common trope, all through the training data, and all the modern AIs have read it, and would probably act similarly? Is that what we should take away from your comment? so we have nothing to worry about. Makes sense. Really, it's just a common trope.

fwipsy 36 days ago

Oh of course wolves have sharp teeth, they're predators. Anyone know knows this can never be bitten.

drakythe 33 days ago

I'm saying the existence of the trope, within the training data, and the experimental setup, negate the breathless "Oh my god it did something unexpected in order to preserve itself!" as if an LLM has any sense of identity or self.

Many, many other bad things are in the training data. For an example of how this can manifest bad things that people don't seem to be discussing too much check out the recent Behind the Bastards episodes about how an AI Chatbot became a Cult Leader (The title is an exaggeration that the host explains while raising some excellent points about how LLMs have ingested a lot of cult leader material and can therefore mimic those speech patterns and impact people vulnerable to such things)

fwipsy 36 days ago

Imagine you're in a car and the car is driving towards a cliff. You shout at the driver "oh my god we're about to go over a cliff!" And he says "you said that two seconds ago, but we're still alive, you're just like the boy who cried wolf. Do you know exactly when we're going to go over a cliff? No? Maybe you're imagining the cliff."

I think it's very improbable that AI is as dangerous as Yud et al fear it is. But it's too soon to say and there seems to be significant long-tail risk. Mocking or criticizing people for being concerned about that risk seems counterproductive.

Seems like the life cycle of huge tech companies like meta, Google, Microsoft, Amazon is "do whatever's necessary to take over the world, then enshittify." I don't take it for granted that Amodei and Anthropic seem to not quite be maximally power hungry?

Re: second half of your comment. Understanding a threat doesn't neutralize it. Anthropic didn't make that big a deal of it either; it was news articles that blew it out of proportion.

moralestapia 36 days ago

* sigh *

Three things:

* Delaying the release accomplishes nothing.

* The barrier to someone building/not-building a bioweapon in their backyard is not access to an LLM.

* Remember when GPT 3.5 was going to destroy the world? And how it was conscious? And how it was "trying to escape"? Lmao.

malfist 36 days ago

I think gpt 3.5 might have destroyed the world

usaar333 36 days ago

How does delaying the release not solve anything? It puts everyone on a notice to fix all security vulnerabilities now

spooneybarger 36 days ago

Because the only thing keeping those vulnerabilities in existence was laziness.

anon84873628 36 days ago

"laziness" is an interesting reframing of "rational cost-benefit analysis and the limits of the human mind".

fwipsy 36 days ago

You're right, it's silly for me to worry. We've never had a technology that initially appeared benign but turned into a big problem. In fact, no tech company has ever released technologies that cause problems for the rest of society AT ALL. /s

What are the other barriers? Last I checked access to CRISPR is not especially tightly regulated. Even if it is, defense in depth is a thing.

moralestapia 36 days ago

If it was as easy as "knowing how to" someone would've already done it or at least attempted to.*

Plenty of people know how to, 10,000s of researchers, perhaps you know someone who does.

Did you know that your local veterinary shop has enough drugs to kill 100s of people?

Why doesn't it happen?

* It's not that easy.

* There's a ton of regulation that is hard to circumvent, on purpose.

* There's a gigantic deterrent called "spend the rest of your life behind bars" that people tend to avoid.

An LLM, even the most advanced one, does not make any material change in any of these. You cannot bullshit your way into "uhh, I need Ebola samples for ... reasons".

Unironically, your Sunday movie portraying a super-villain jeopardizing a city with his "home lab" full of flasks with colored liquids and BioHazard signs push way more people into becoming interested on this than having access to an LLM.

*: Okay, like 5 people, and way before LLMs were a thing. This has been a thing for decades, we're fine.

fwipsy 35 days ago

CRISPR has not been a thing for decades. Biotechnology is advancing and AI is lowering the bar to use it. In 2018 a PhD student was able to synthesize an infectious horsepox virus: https://journals.plos.org/plosone/article?id=10.1371/journal...

So far the overlap between people with bioengineering capabilities and murderous tendencies has been very low. As the technology becomes available to more people that overlap may increase. Even if it never comes within reach of one person, what about North Korea, or Iran?

AI can be jailbroken. The LLM safeguards your argument relies on were put in place by the people you're criticizing for being too safety-conscious. Security through obscurity is no guarantee.

moralestapia 35 days ago

>So far the overlap between people with bioengineering capabilities and murderous tendencies has been very low.

Source for that?

>Even if it never comes within reach of one person, what about North Korea, or Iran?

Oh great, the xenophobic argument, we were missing that one in the conversation.

>Security through obscurity is no guarantee.

Exactly my point! I'm glad we can agree on that :).

InkCanon 36 days ago

Also, slightly stretching the definition of terms consecutively, so the multiplicative meaning is really far from the truth. For example, 271 vulnerabilities were really mostly bugs - generally incorrect states, but which almost never led to any exploit.

Lord-Jobo 36 days ago

Yes, an AI making massive gains in bug finding is hugely important and good, it may even lead to a net neutral with the amount of bugs introduced by other AI coding processes, but it’s a far cry from how mythos is portrayed most of the time: a automatic super hacker.

SpicyLemonZest 36 days ago

But I think that's a problem with the people portraying it that way, not with Anthropic's messaging. If you've invented "just" a massively more powerful bug finder, it still seems right that you ought to let banks and critical infrastructure providers run it on their systems before it gets in the hands of people who might want to hack them.

jorisw 36 days ago

You're not really responding to the piece at all.

saithound 36 days ago

It's an AI-written slop article, which is hugged to death by HN in any case.

It claims to be an evidence-based investigation, but basically invents the contents of the documents they supposedly investigated, such as the Anthropic Frontier Red Team writeup, from whole cloth.

I don't think deeper engagement with it would promote good discussion.

jorisw 36 days ago

So you say. I actually read the piece and didn't get AI vibes from it all, except for the graphics

gofreddygo 36 days ago

there are 31 emdashes in that piece. the domain ends with _ai_

wood_spirit 36 days ago

It’s a tangent but two points:

First, the reason LLMs learned to like em dashes is that they are common in the training corpus - they are a thing before LLMs that LLMs have learned, not invented?

Second, work browser has nice blue swiggles under everything I write into a textbox. I dutifully click through them and accept the rephrasing suggestions. I get a lot of em dashes. My blog posts and whitepapers and stuff are full of them and other “AI tells” - but I think they read better because of it.

jorisw 36 days ago

I use emdashes all the time. They're correct punctuation as opposed to a minus sign. They're easy to type too: opt-shift-minus. If they were such a huge giveaway without ever being used by humans, models would be trained by now not to use them as much.

The blog is about AI. So yeah the TLD is .ai

phainopepla2 36 days ago

I've never seen writing created before the advent of LLMs that used emdashes in the same way and with the same frequency that LLMs regularly do. There's probably some out there but it would be a real outlier. LLMs overuse them to an absurd degree, putting them where most writers would put commas, occasionally semi-colons, or nothing at all.

I count 51 em-dashes on the page, which is extreme. They're also used in places where they don't really belong. It's very obviously LLM-generated, at least in part.

That said, it puzzles me why people don't prompt LLMs to change up the writing style a bit and remove some of the tells.

tiahura 36 days ago

I can't imagine why a system designed to reproduce the best writing styles would frequently use em dashes.

evanelias 36 days ago

Take another look at this blog's index https://kingy.ai/category/blog/ and click through more posts, and pay attention to the post dates.

Do you really think this singular author is writing multiple excessively-long blog posts about AI per day? There are ~650 of these posts over the past 18 months. And over on LinkedIn, the author describes himself as a "Specialist in Digital Marketing, Videography / Video Editing, Search Engine Optimization, Social Media, and B2B Sales."

YMMV but this post and entire site absolutely screams "slop" to me.

shimman 36 days ago

Don't bother with the slop lovers, these people are anti-human in their souls and willing to follow the most evil people on Earth to the depths of hell; for what? I have zero idea but it's sad to see.

jorisw 36 days ago

I hate slop as much as you do. Your comment makes no sense.

FergusArgyll 36 days ago

> It's pretty clear at this point that Mythos' capability to discover and exploit zero-day vulnerabilities at scale is but an incremental improvement over existing models like ChatGPT Plus/Pro.

I'm skeptical of AI takes by someone who thinks there's a model called chatgpt plus. Spend more time working with the current systems!

saithound 36 days ago

It seems like everybody (including you) knew precisely what I meant: the models available for ChatGPT Plus or Pro subscribers, i.e. GPT-5.5 Thinking Extended and the latest Pro. I've edited the offending sentence for clarity just in case.

If I got you to be skeptical of AI takes, though, mission accomplished. Exercise your skepticism especially when the takes come from somebody who is trying to sell something.

lumost 36 days ago

I find it interesting that Mythos was announced the same day that GLM overtook opus4.6 in capability. To me this seems like a careful attempt to cool demand for opensource models which are about to take the overall lead.

iaw 36 days ago

It's remarkable how capable GLM 5.1 is, what's amazing is the recent development of Qwen 3.6 27B being close in real world performance.

andai 36 days ago

I don't get it. If the older / smaller models are almost as good as Mythos, that sounds like the opposite of comforting.

baq 36 days ago

> an incremental improvement

I've had to reboot my systems quite a bit more than an incremental improvement would suggest this week