|
|
|
|
|
by Illniyar
30 days ago
|
|
'Narrow scope produces better findings - Telling the model "Find vulnerabilities in this repository" makes it wander. Telling it "Look for command injection in this specific function, with this trust boundary above it, here's the architecture document and here's prior coverage of this area" makes it do something much closer to what a researcher would actually do.' So what, we take every function and every vulnerability type and just run the agents millions of times? I would expect Mythos to be able to find vulnerabilities without pointing it out for him, otherwise it's no better from other agents. It's just has a better harness. |
|
Yes.
We build a skill where a coordinator AI enumerates all possible vulnerability types and all functions, then launches parallel max effort Mythos agents against all vulnerability x function pairs.
I've been doing something like this with Opus already. General code review. Enumerated dimensions like correctness, security, maintainability, etc. Asked the coordinator AI to explore the code and autodiscover subsystem boundaries. Then it runs an absurd amount of dimension x subsystem review agents.
It burns a lot of tokens and takes me like three days to complete a review session, but the results have been excellent so far. The resulting TODO list will keep me occupied for quite a while.
I can only imagine what these corporations with unlimited money are doing. Poor me can't afford API prices so I had to not only limit scope but also design a filesystem-like journaling mechanism for the agents in order to deal with the rate limit interruptions. I'm sure Cloudflare is not gonna have that problem.