| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by concinds 45 days ago
	These models demonstrably have good vulnerability research capabilities. I'm sure their marketing department is ecstatic but you guys are far more hype-based than what you're calling out.

2 comments

authnopuz 45 days ago

Good but not necessarily better that was is already pay-as-you-go available today. ref. https://www.flyingpenguin.com/the-boy-that-cried-mythos-veri...

This AISLE benchmark is interesting in this matter: https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jag...

And the recently discovered Copy Fail by Xint code is another proof that the gating is overblown: https://xint.io/blog/copy-fail-linux-distributions

link

aesthesia 44 days ago

Calling the AISLE experiment a "benchmark" is generous. They tested three code snippets on each model.

link

ZyanWu 45 days ago

> demonstrably

I'm not entirely up to date on each week's LLM hype train/scandal but last I heard there was no public access to it or public-trusted 3rd parties that can review model's capabilities

link

concinds 44 days ago

I don't think so

https://x.com/AISecurityInst/status/2049868227740565890

link

2ndorderthought 45 days ago

You are up to date. Mythos had unauthorized access because of poor security but that's it as far as I know. Not exactly a good sign for something being advertised as a weapon...

link

saghm 44 days ago

You'd think if Mythos was so good at finding security issues they could point it at their own setup for it and have found those issues easily...

link

SpicyLemonZest 45 days ago

It’s easy to end up with no public-trusted third parties if we arbitrarily distrust third parties who say the capabilities match what’s promised. Mozilla for example says it found hundreds of Firefox vulnerabilities, and I think it’s pretty unlikely they’re lying to cover Anthropic’s back.

link

calgoo 45 days ago

I think the question around the Firefox find, is not that they found hundreds of vulnerabilities - they found hundreds of bugs.

What would be really interesting is a side by side Claude Opus 4.7 and Mythos comparison.

link