|
|
|
|
|
by 0points
325 days ago
|
|
> it has found them (and other unsolicited bugs) that nobody was able to find themselves. How did you evaluate this? Would be interested in seeing results. I am specifically interested in the amount of false issues found by the LLM, and examples of those. |
|
There are false positives, and they mostly come from the LLM missing relevant context like a detail about the priors or database schema. The iterative nature of an LLM convo means you can add context as needed and ratchet into real bugs.
But the false positives involve the exact same cycle you do when you're looking for bugs yourself. You look at the haystack and you have suspicions about where the needles might be, and you verify.