Hacker News new | ask | show | jobs
by stavros 76 days ago
I don't understand why the takeaway here is (unless I'm missing something), more or less "everything is going to get exploited all the time". If LLMs can really find a ton of vulnerabilities in my software, why would I not run them and just patch all the vulnerabilities, leading to perfectly secure software (or, at the very least, software for which LLMs can no longer find any new vulnerabilities)?
12 comments

When did we enter the twilight zone where bug trackers are consistently empty? The limiting factor of bug reduction is remediation, not discovery. Even developer smoke testing usually surfaces bugs at a rate far faster than they can be fixed let alone actual QA.

To be fair, the limiting factor in remediation is usually finding a reproducible test case which a vulnerability is by necessity. But, I would still bet most systems have plenty of bugs in their bug trackers which are accompanied by a reproducible test case which are still bottlenecked on remediation resources.

This is of course orthogonal to the fact that patching systems that are insecure by design into security has so far been a colossal failure.

Bugs are not the same as (real) high severity bugs.

If you find a bug in a web browser, that's no big deal. I've encountered bugs in web browsers all the time.

You figure out how to make a web page that when viewed deletes all the files on the user's hard drive? That's a little different and not something that people discover very often.

Sure, you'll still probably have a long queue of ReDoS bugs, but the only people who think those are security issues are people who enjoy the ego boost if having a cve in their name.

Eh, with browsers you can tell the user to go to hell if they don't like a secure but broken experience. The problem in most software is that you commit to bad ideas and then have to upset people who have higher status than the software dev that would tell them to go to hell.
That might have been true pre LLMs but you can literally point an agent at the queue until it’s empty now.
You literally cannot, since ANY changes to code tend to introduce unintended (or at least not explicitly requested) new behaviors.
Eventual convergence? Assuming each defect fix has a 30% chance of introducing a new defect, we keep cycling until done?
Assuming you can catch every new bug it introduces.

Both assumptions being unlikely.

You also end up with a code base you let an AI agent trample until it is satisfied; ballooned in complexity and redudant brittle code.

You can have an AI agent refactor and improve code quality.
That's assuming that each fix can only introduce at most one additional defect, which is obviously untrue.
Why would it converge?
The chance of a defect fix introducing a new defect tends to grow linearly with the size of the codebase, since defects are usually caused by the interaction between code and there's now more code to interact with.

If you plot this out, you'll notice that it eventually reaches > 100% and the total number of defects will eventually grow exponentially, as each bugfix eventually introduces more bugs than it fixes. Which is what I've actually observed in 25 years in the software industry. The speed at which new bugs are introduced faster than bugfixes varies by organization and the skill of your software architects - good engineers know how to keep coupling down and limit the space of existing code that a new fix could possibly break. I've seen some startups where they reach this asymptote before bringing the product to market though (needless to say, they failed), and it's pretty common for computer games to become steaming piles of shit close to launch, and I've even seen some Google systems killed and rewritten because it became impossible to make forward progress on them. I call this technical bankruptcy, the end result of technical debt.

As long as we're inventing numbers, what if it's a 90% chance?

What if it's a 200% chance, and every fix introduces multiple defects?

Except they don't converge. You see that if you use agents to evolve a codebase. We also saw exactly that in the failed Anthropic experiment to create a C compiler.
I’ve had mine on a Ralph loop no problem. Just review the PR..
Which still means a single person with Claude can clear a queue in a day versus a month with a traditional team.
Your example must have incredible users or really trivial software.
The fact that KiCad still has a ton of highly upvoted missing features and the fact that FreeCAD still hasn't solved the topological renumbering problem are existence proofs to the contrary.
Shouldn't be down voted for saying this. There are active repo's this is happening in.

"BuT ThE LlM iS pRoBaBlY iNtRoDuCiNg MoRe BuGs ThAn It FiXeS"

This is an absurd take.

It probably is introducing more bugs because I think some people dont understand how bugs work.

Very, very rarely is a bug a mistake. As in, something unintentional that you just fix and boom, done.

No no. Most bugs are intentional, and the bug part is some unintended side effects that is a necessary, but unforseen, consequence of the main effect. So, you can't just "fix" the bug without changing behavior, changing your API, changing garauntees, whatever.

And that's how you get the 1 month 1-liner. Writing the one line is easy. But you have to spend a month debating if you should do it, and what will happen if you do.

So, you have already fixed all the bugs and now just cruising through life?
I wonder whether people like you have actually used Claude for any length of time.

I use it all day. I consider it a near-miracle. Yet I correct it multiple times daily.

> I wonder whether people like you have actually used Claude for any length of time.

I stated the LLMs are actively being used in repo's today, to chew through backlog items, and your response is to wonder if I've ever used Claude.

To me it's surprising that someone like you, who appears to have a reading comprehension deficiency, is able to use Claude.

The pressure to do so will only happen as a consequence of the predicted vulnerability explosion, and not before it. And it will have some cost, as you need dedicated and motivated people to conduct the vulnerability search, applying the fixes, and re-checking until it comes up empty, before each new deployment.

The prediction is: Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won’t be a slow burn, but rather a step function. Substantial amounts of high-impact vulnerability research (maybe even most of it) will happen simply by pointing an agent at a source tree and typing “find me zero days”.

I feel like the dream of static analysis was always a pipe.

When the payment for vulns drops i'm wondering where the value is for hackers to run these tools anymore? The LLMs don't do the job for you, testing is still a LOT OF WORK.

Breaking something is easier than fixing it.
People have said that for decades and it wasn't true until recently.
Hmm: can you elaborate?

I've never been on a security-specific team, but it's always seemed to me that triggering a bug is, for the median issue, easier than fixing it, and I mentally extend that to security issues. This holds especially true if the "bug" is a question about "what is the correct behavior?", where the "current behavior of the system" is some emergent / underspecified consequence of how different features have evolved over time.

I know this is your career, so I'm wondering what I'm missing here.

It has generally been the case that (1) finding and (2) reliably exploiting vulnerabilities is much more difficult than patching them. In fact, patching them is often so straightforward that you can kill whole bug subspecies just by sweeping the codebase for the same pattern once you see a bug. You'd do that just sort of as a matter of course, without necessarily even qualifying the bugs you're squashing are exploitable.

As bugs get more complicated, that asymmetry has become less pronounced, but the complexity of the bugs (and their patches) is offset by the increased difficulty of exploiting them, which has become an art all its own.

LLMs sharply tilt that difficulty back to the defender.

In a sense, breaking a vulnerability is easier than fixing it up to be an exploit.
Specifically in software vulnerability research, you mean.

Fixing vulnerable code is usually trivial.

In the physical world breaking things is usually easier.

A proper fix maybe. But LLMs can easily make it no longer exploitable in most cases.
That's why you simply make the LLM part of the CI checks on PRs.
That might be one outcome, especially for large, expertly-staffed vendors who are already on top of this stuff. My real interest in what happens to the field for vulnerability researchers.
Perhaps a meta evolution, they become experts at writing harnesses and prompts for discovering and patching vulnerabilities in existing code and software. My main interest is, now that we have LLMs, will the software industry move to adopting techniques like formal verification and other perhaps more lax approaches that massively increase the quality of software.
> Perhaps a meta evolution, they become experts at writing harnesses and prompts

Harnesses, maybe, but prompts?

There's still this belief amongst AI coders that they can command a premium for development because they can write a prompt better than Bob from HR, or Sally from Accounting.

When all you're writing are prompts, your value is less than it was before., because the number of people who can write the prompt is substantially more than the number of people who could program.

Also, synthetic data and templates to help them discover new vulnerabilities or make agents work on things they're bad at. They differentiate with their prompts or specialist models.

Also, like ForAllSecure's Mayhem, I think they can differentiate on automatic patching that's reliable and secure. Maybe test generation, too, that does full coverage. They become drive by verification and validation specialists who also fix your stuff for you.

Testing exists.

> formal verification

Outside of limited specific circumstances, formal verification gives you nothing that tests don't give you, and it makes development slow and iteration a chore. People know about it, and it's not used for lot of reasons.

This statement shows an intense lack of technical knowledge. You’re probably one of those ignorant managerial types.

First type checking is a form of formal verification and it’s used everywhere. Second have you heard of rust? Do you know why it’s becoming an alternative to C++ or C? Entirely because of its type checker or aka formal verification. It is the literal main reason why rust was created.

Have you heard of typescript? It’s essentially a formal verification layer over JavaScript. Everyone uses it now for the front end.

You don’t know what you’re talking about. I recommend you do some research before saying anything on this site.

I agree with this take. Nothing changes, everything just evolves. Been happening for 60 years, will (likely) continue to happen for the next 60 years.
True, but I already am curious to see what happens in a multitude of fields, so this is just one more entry in that list.
Just wanted to point out that tptacek is the blog post's author (and a veteran security researcher).
Attackers only have to be successful once while defenders have to be successful all the time?
Yes and no. Good defence is layered and an attacker needs to find a hole in each layer. Even if it is not layered intentionally a locally exploitable vulnerability gives little if you have no access to a remote system. But some asymmetry does exist.
Find-then-patch only works if you can fix the bugs quicker than you’re creating new ones.

Some orgs will be able to do this, some won’t.

"Find me vulnerabilities in this PR."
My sense is that the asymmetry is non-trivial issue here. In particular, a threat actor needs one working path, defenders need to close all of them. In practice, patching velocity is bounded by release cycles, QA issues / regression risk, and a potentially large number of codebases that need to be looked at.
> If LLMs can really find a ton of vulnerabilities in my software, why would I not run them and just patch all the vulnerabilities, leading to perfectly secure software?

Probably because it will be a felony to do so. Or, the threat of a felony at least.

And this is because it is very embarrassing for companies to have society openly discussing how bad their software security is.

We sacrifice national security for the convenience of companies.

We are not allowed to test the security of systems, because that is the responsibility of companies, since they own the system. Also, companies who own the system and are responsible for its security are not liable when it is found to be insecure and they leak half the nations personal data, again.

Are you seeing how this works yet? Let's not have anything like verifiable and testable security interrupt the gravy train to the top. Nor can we expect systems to be secure all the time, be reasonable.

One might think that since we're all in this together and all our data is getting leaked twice a month, we could work together and all be on the lookout for security vulnerabilities and report them responsibly.

But no, the systems belong to companies, and they are solely responsible. But also (and very importantly) they are not responsible and especially they are not financially liable.

>> If LLMs can really find a ton of vulnerabilities in my software, why would I not run them and just patch all the vulnerabilities, leading to perfectly secure software?

>Probably because it will be a felony to do so. Or, the threat of a felony at least.

"my software" implies you own it (ie. your SaaS), so CFAA isn't an issue. I don't think he's implying that vigilante hackers should be hacking gmail just because they have a gmail account.

I've worked at companies before where they have balked at spending $300 to buy me a second hand thinkpad because I really wanted to work on a Linux machine rather than a mac. I don't see them throwing $unlimited at tokens to find vulnerabilities, at least until after it's too late.
I think you’re right that they’re going to skimp as much as regulators & the market let them, but that Thinkpad would cost a lot more than $300: a new platform is an ongoing cost for maintenance, security, and interoperability – not crushing, but those factors quickly outweigh the hardware.
Because not all software gets auto-updated. Most of it does not!
Takeaway is formal software.
closed source software

deliberate vulnerabilities (thanks nsa)