| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by stratos123 54 days ago

In terms of quantity, definitely yes (a single person managing a swarm of Opusi can already find much more real bugs than a security researcher, hence the rise in reports).

In terms of quality ("are there bugs that professional humans can't see at any budget but LLMs can?") - it's not very clear, because Opus is still worse than a human specialist, but Mythos might be comparable. We'll just have to wait and see what results Project Glasswing gets.

Either way, cybersecurity is going to get real weird real soon, because even slightly-dumb models can have a large effect if they are cheap and fast enough.

EDIT: Mozilla thinks "no" to the second question, by the way: "Encouragingly, we also haven’t seen any bugs that couldn’t have been found by an elite human researcher.", when talking about the 271 vulnerabilities recently found by Mythos. https://blog.mozilla.org/en/firefox/ai-security-zero-day-vul...

2 comments

chuckadams 54 days ago

> Opusi

The plural of "Opus" is "Opera". Might be a tad confusing tho :)

link

robocat 53 days ago

Opuses is also correct English, and clearer in non-academic contexts.

Opera is the traditional plural from Latin, now perhaps for more scholarly use in English.

Results from a quick search.

link

chuckadams 53 days ago

I'll do the faux German thing then: Opusen :)

link

robocat 40 days ago

pseudofauxteutönic

link

skeledrew 54 days ago

Wondered for a second "what does that browser have to do with all this?"

link

DanielHB 54 days ago

There is also a huge surface area of security problems that can't happen in practice due to how other parts of the code work. A classic example is unsanitized input being used somewhere where untrusted users can't inject any input.

Being flooded with these kind of reports can make the actual real problems harder to see.

link

arcfour 53 days ago

They wouldn't be classed as vulnerabilities then, since, you know, there is no vulnerability. Unless you have evidence that most of these issues are unexploitable, but I would be surprised to hear that they were considered vulnerabilities in that case.

link

DanielHB 52 days ago

I believe the LLM would flag this kind of thing as a potential issue.

link