Hacker News new | ask | show | jobs
by 0points 325 days ago
> it has found them (and other unsolicited bugs) that nobody was able to find themselves.

How did you evaluate this? Would be interested in seeing results.

I am specifically interested in the amount of false issues found by the LLM, and examples of those.

1 comments

Well, how do you verify any bug? You listen to someone's explanation of the bug and double check the code. You look at their solution pitch. Ideally you write a test that verifies the bug and again the solution.

There are false positives, and they mostly come from the LLM missing relevant context like a detail about the priors or database schema. The iterative nature of an LLM convo means you can add context as needed and ratchet into real bugs.

But the false positives involve the exact same cycle you do when you're looking for bugs yourself. You look at the haystack and you have suspicions about where the needles might be, and you verify.

> Well, how do you verify any bug?

You do or you don't.

Recently we've seen many "security researchers" doing exactly this with LLM:s [1]

1: https://www.theregister.com/2025/05/07/curl_ai_bug_reports/

Not suggesting you are doing any of that, just curious what's going on and how you are finding it useful.

> But the false positives involve the exact same cycle you do when you're looking for bugs yourself.

In my 35 years of programming I never went just "looking for bugs".

I have a bug and I track it down. That's it.

Sounds like your experience is similar to using deterministic static code analyzers but more expensive, time consuming, ambiguous and hallucinating up non-issues.

And that you didn't get a report to save and share.

So is it saving you any time or money yet?

Oh, I go bug hunting all the time in sensitive software. It's the basis of test synthesis as well. Which tests should you write? Maybe you could liken that to considering where the needles will be in the haystack: you have to think ahead.

It's a hard, time consuming, and meandering process to do this kind of work on a system, and it's what you might have to pay expensive consultants to do for you, but it's also how you beat an expensive bug to the punchline.

An LLM helps me run all sorts of considerations on a system that I didn't think of myself, but that process is no different than what it looks like when I verify the system myself. I have all sorts of suspicions that turn into dead ends because I can't know what problems a complex system is already hardened against.

What exactly stops two in-flight transfers from double-spending? What about when X? And when Y? And what if Z? I have these sorts of thoughts all day.

I can sense a little vinegar at the end of your comment. Presumably something here annoys you?

> I can sense a little vinegar at the end of your comment. Presumably something here annoys you?

Thanks for your responses.

Really sorry about the vinegar, not intentional. I may have such personality disorder idk. Being blunt, not very great communication skills.

It's ok, I do worse things on HN.

My vice is when someone writes a comment where I have a different opinion than them, and their comment makes me think of my own thoughts on the subject.

But since I'm responding to them, I feel like it's some sort of debate/argument even though in reality I'm just adding my two cents.