|
|
|
|
|
by mozdeco
99 days ago
|
|
[work at Mozilla] I agree that LLMs are sometimes wrong, which is why this new method here is so valuable - it provides us with easily verifiable testcases rather than just some kind of analysis that could be right or wrong. Purely triaging through vulnerability reports that are static (i.e. no actual PoC) is very time consuming and false-positive prone (same issue with pure static analysis). I can't really confirm the part about "local" bugs anymore though, but that might also be a model thing. When I did experiments longer ago, this was certainly true, esp. for the "one shot" approaches where you basically prompt it once with source code and want some analysis back. But this actually changed with agentic SDKs where more context can be pulled together automatically. |
|
I completely agree that LLMs are great when instructed to provide provable, repeatable exploits. I have done this multiple times and uncovered some neat bugs.
> I can't really confirm the part about "local" bugs anymore though, but that might also be a model thing.
I don't think it's a model thing, it's just a sort of basic limitation of the technology. We shouldn't expect LLMs to perform novel tasks so we shouldn't expect LLMs to find novel vulnerabilities.
Agents help, human in the loop is critical for "injecting novelty" as I put it. The LLM becomes great at producing POCs to test out.