Would be really interested to understand the cost to complete the reviews, and the man time cost in getting to the actual vulns and getting rid of all the false positives. did you come across many false positives?
Great question.
In our experiments the full process (discovery → validation) took on the order of hours rather than weeks, but the key part was filtering and validating results.
For false positives we use a specialized CAI agent called the "retester" agent. Its job is to automatically re-run and validate candidate exploits to confirm whether a vulnerability is actually reproducible.
So the workflow becomes: AI discovery → exploit generation → automated retesting → human review.
That reduces the manual time required to get from "potential issue" to confirmed vulnerability quite significantly compared to traditional robotics security research workflows.
Across the three consumer robots we tested, the system identified 38 validated vulnerabilities.
For false positives we use a specialized CAI agent called the "retester" agent. Its job is to automatically re-run and validate candidate exploits to confirm whether a vulnerability is actually reproducible.
So the workflow becomes: AI discovery → exploit generation → automated retesting → human review.
That reduces the manual time required to get from "potential issue" to confirmed vulnerability quite significantly compared to traditional robotics security research workflows.
Across the three consumer robots we tested, the system identified 38 validated vulnerabilities.