I met a few people at PyCon this week who have been part of Glasswing (they're just starting to be allowed to talk about it) and it really does drive down the cost of finding vulnerabilities.
You might point them at the cloudflare blog about deploying mythos - I found it interesting. Upshot — as your folks discovered, deployment, harness, and utilization method matters for mythos and is a bit different than how you’d deploy a coding agent for writing code, but if you do that, you get something capable of end to end chaining and reasoning about a much broader class of vulnerabilities.
No personal experience with it. But the security team writeups I’ve read are significantly more positive about it than you describe, so it might be worth a second look.
Wouldn't it drive up the cost of finding vulnerabilities when all the low hanging fruit has already been scanned and patched? Like the new baseline for finding a vulnerability will be something an LLM couldn't find.