Hacker News new | ask | show | jobs
by tptacek 330 days ago
This is a confusing piece. A lot of it would make sense if Weakly was talking about a coding agent (a particular flavor of agent that worked more like how antirez just said he prefers coding with AI in 2025 --- more manual, more advisory, less do-ing). But she's not: she's talking about agents that assist in investigating and resolving operations incidents.

The fulcrum of Weakly's argument is that agents should stay in their lane, offering helpful Clippy-like suggestions and letting humans drive. But what exactly is the value in having humans grovel through logs to isolate anomalies and create hypotheses for incidents? AI tools are fundamentally better at this task than humans are, for the same reason that computers are better at playing chess.

What Weakly seems to be doing is laying out a bright line between advising engineers and actually performing actions --- any kind of action, other than suggestions (and only those suggestions the human driver would want, and wouldn't prefer to learn and upskill on their own). That's not the right line. There are actions AI tools shouldn't perform autonomously (I certainly wouldn't let one run a Terraform apply), but there are plenty of actions where it doesn't make sense to stop them.

The purpose of incident resolution is to resolve incidents.

8 comments

There's no AI tool today that will resolve incidents to anyone's satisfaction. People need to be in the loop not only to take responsibility but to make sure the right actions are performed.
Nobody disputes this. Weakly posits a bright line between agents suggesting active steps and agents actually performing active steps. The problem is that during incident investigations, some active steps make a lot of sense for agents to perform, and others don't; the line isn't where she seems to claim it is.
Understood. To your example about the logs, my concern would be be that the AI chooses the wrong thing to focus on and people decide there’s nothing of interest in the logs, thus overlooking a vital clue.
You wouldn't anticipate using AI tools to one-shot complex incidents, just to rapidly surface competing hypotheses.
Exactly. There seems to be this fantasy in which you can somehow string different kinds of agents together, one designing and one reviewing, and that finally producing something superior as output - I just don't buy that.

Sounds like heuristics added on top of statistics, which is trying to remedy some root problem with another hack.

Hmm, but this provably works right now though? All LLMs perform better with roleplay direction and focused scope. Using coding agents with plan then execute makes noticeable quality improvements.
Why isn't this the de facto then? Anyone packaging such commercial solutions?
Most agents solutions have modes or roles already. There’s no standard, but this is already being used IRL. Heck, even system prompts are role play too.
The whole field of metaheuristic algorithms rests on a similar idea. a lot of stupid "agents" finding a good solution. by metaheuristics i mean genetic algorithms, PSO, ACO etc.
It's not a confusing piece if you don't skip/ignore the first part. You're using her one example and removing the portion about how human beings learn and how AI is actively removing that process. The incident resolution is an example of her general point.
I feel pretty comfortable with how my comment captures the context of the whole piece, which of course I did read. Again: what's weird about this is that the first part would be pretty coherent and defensible if applied to coding agents (some people will want to work the way she spells out, especially earlier in their career, some people won't), but doesn't make as much sense for the example she uses for the remaining 2/3rds of the piece.
It makes perfect sense for that case too. If you let AI do the whole job of incident handling (and leaving aside the problem where they'll get it horribly wrong), that also has the same problem of breaking the processes by which people learn. (You could make the classic "calculator" vs "long division" argument here, but one difference is, calculators are reliable.)

Also:

> some people will want to work the way she spells out, especially earlier in their career

If you're going to be insulting by implying that only newbies should be cautious about AI preventing them from learning, be explicit about it.

You can simply disagree with me and we can hash it out. The "early career" thing is something Weakly herself has called out.

I disagree with you that incident responders learn best by e.g. groveling through OpenSearch clusters themselves. In fact, I think the opposite thing is true: LLM agents do interesting things that humans don't think to do, and also can put more hypotheses on the table for incident responders to consider, faster, rather than the ordinary process of rabbitholing serially down individual hypothesis, 20-30 minutes at a time, never seeing the forest for the trees.

I think the same thing is probably true of things like "dumping complicated iproute2 routing table configurations" or "inspecting current DNS state". I know it to be the case for LVM2 debugging†!

Note that these are all active investigation steps, that involve the LLM agent actually doing stuff, but none of it is plausibly destructive.

Albeit tediously, with me shuttling things to and from an LLM rather than an agent doing things; this sucks, but we haven't solved the security issues yet.

The only mention I see of early-career coming up in the article is "matches how I would teach an early career engineer the process of managing an incident". That isn't a claim that only early career engineers learn this way or benefit from working in this style. Your comment implied that the primary people who might want to work in the way proposed in this article are those early in their career. I would, indeed, disagree with that.

Consider, by way of example, the classic problem of teaching someone to find information. If someone asks "how do I X" and you answer "by doing Y", they have learned one thing (and will hopefully retain it). If someone asks "how do I X" and you answer "here's the search I did to find the answer of Y", they have now learned two things, and one of them reinforces a critical skill they should be using throughout their career.

I am not suggesting that incident response should be done entirely by hand, or that there's zero place for AI. AI is somewhat good at, for instance, looking at a huge amount of information at once and pointing towards things that might warrant a closer look. I'm nonetheless agreeing with the point that the human should be in the loop to a large degree.

That also partly addresses the fundamental security problems of letting AI run commands in production, though in practice I do think it likely that people will run commands presented to them without careful checking.

> none of it is plausibly destructive

In theory, you could have a safelist of ways to gather information non-destructively. In practice, it would not surprise me at all if pople don't. I think it's very likely that many people will deploy AI tools in production and not solve any of the security issues, and incidents will result.

I am all for the concept of having a giant dashboard that collects and presents any non-destructive information rapidly. That tool is useful for a human, too. (Along with presenting the commands that were used to obtain that information.)

Previous writing, Josh, and I'm done now litigating whether I wrote the "early career" thing in bad faith and expect you to be too.

I don't see you materially disagreeing with me about anything. I read Weakly to be saying that AI incident response tools --- the main focus of her piece --- should operate with hands tied behind their back, delegating nondestructive active investigation steps back to human hands in order to create opportunities for learning. I think that's a bad line to draw. In fact, I think it's unlikely to help people learn --- seeing the results of investigative steps all lined up next to each other and synthesized is a powerful way to learn those techniques for yourself.

I know you carry on to have a good argument down thread, but why do you feel the first part defensible?

The author's saying great products don't come from solo devs. Linux? Dropbox? Gmail? Ruby on Rails? Python? The list is literally endless.

But the author then claims that all great products come from committee? I've seen plenty of products die by committee. I've never seen one made by it.

Their initial argument is seriously flawed, and not at all defensible. It doesn't match reality.

I just don't want to engage with it; I'm willing to stipulate those points. I'm really fixated on the strange example Weakly used to demonstrate why these tools shouldn't actually do things, but instead just whisper in the ears of humans. Like, you can actually make that argument about coding! I don't agree, but I see how the argument goes. I don't see how it makes any sense at all for incident response.
I know the "what you refer to as Linux is, in fact, GNU/Linux" thing has become a sort of tongue-in-cheek meme, but it actually applies here: crediting Linus Torvalds alone for the success of Linux ignores crucial contributions from RMS, Ken Thompson, Dennis Ritchie and probably dozens or hundreds of others.

Ruby on Rails? Are we talking about the Ruby part (Matz) or the Rails part (DHH)?

Dropbox was founded by Drew Houston and Arash Ferdowsi. The initial Gmail development team had multiple people plus the infrastructure and resources of Google. I'm not sure why people love the lone genius story so much, but it's definitely the exception and not the rule.

I think the problem is do you want to give the AI access to prod. See the recent example where AI wiped a DB despite instructions not to (because AI sometimes does things more often when you tell it not to do something because the negative from not is not always reliably picked up)
I know you've got a subthread about this exact idea, but I do think there is some value in manually performing the debugging process if (and perhaps only if) your goal is to improve your overall programming ability.

I guess the chess analogy would be that it makes a lot of sense to analyse positions yourself, even though Leela and Stockfish can do a far more thorough job in much less time. Of course, if you just need to know the best move right now, you would use the AI, and professionals do that all the time.

But as a decently strong chess player I cannot imagine improving without doing this kind of manual practice (at least beyond a basic level of skill like knowing how pieces move). Grandmasters routinely drill tactics exercises, for instance, even though they are "mundane" at that level of ability.

I guess the crux of it - do you think AI+person learns faster than just person for this kind of thing? And why? It's not obvious to me either way (and another question is whether the skill is even relevant any more... I think so, but I know people who don't).

But you can do that _after_ the incident. When things are not on fire.

You don’t run analysis of your chess game when the clock is ticking.

Sure, if something is super critical then you should solve the problem as fast as possible. I'm not debating that. But there's probably a middle ground there somewhere for less critical issues. I suspect the process of generating and falsifying hypotheses quickly is the skill, and I don't know if you can effectively train that skill after an incident, when you've already seen the resolution.

Chess is maybe not a great analogy, because there are rarely objectively correct answers, only hard trade-offs. For that reason there's still a lot of value in reviewing a finished game.

> There are actions AI tools shouldn't perform autonomously (I certainly wouldn't let one run a Terraform apply), but there are plenty of actions where it doesn't make sense to stop them.

I'm curious as to where you would draw the line. Assuming you've adhered to DevOps best practices, most--if not all--changes would require some sort of code commit and promotion through successive environments to reach production. This isn't just application code, of course; it's also your infrastructure. In such a situation, what would you permit an agent to autonomously perform in the course of incident resolution?

During incident resolution, most of the actions an operator takes are diagnostic commands, not changes.
The number one cause of incidents is change, and the number one response to them is to initiate a rollback. Maybe you’re right about investigation, which requires no changes, but resolution requires action, which does.

In any event, you said:

> What Weakly seems to be doing is laying out a bright line between advising engineers and actually performing actions --- any kind of action, other than suggestions (and only those suggestions the human driver would want, and wouldn't prefer to learn and upskill on their own). That's not the right line.

So what’s your quibble exactly? Those suggestions would come from autonomous analyses, would they not? What is the right line, in your view?

I would not in 2025 during an incident response have an agent do speculative changes, or really any changes at all.

I would have an agent perform diagnostic steps: dumping devicemapper tables, iproute2 configurations, nftables rules, BGP advertisements, Consul data, and, especially, logs and oTel telemetry.

Weakly's article is in large part about not allowing agents to do the things in the second category there.

Lost in a bit of the discourse around anomaly detection and incident management is that not all problems are equal. Many of them actually are automatable to some extent. I think the issue is understanding when something is sufficiently offloadable to some cognitive processor vs. when you really do need a human engineer involved. To your point, yes, they are better at detecting patterns at scale … until they’re not. Or knowing if a pattern is meaningful. Of course not all humans can fill these gaps either.
> But what exactly is the value in having humans grovel through logs to isolate anomalies and create hypotheses for incidents?

Agreed! I think about this using Weakly's own reference to "standing on the shoulders of giants."

To me, building abstractions to handle tedious work is how we do that. We moved from assembly to compilers, and from manual memory management to garbage collectors. That wasn't "deskilling" - it just freed us up to solve more interesting problems at a higher level.

Manually crawling through logs feels like the next thing we should happily give up. It's painful, and I don't know many engineers who enjoy it.

Disclaimer: I'm very biased - working on an agent for this exact use case.

>The fulcrum of Weakly's argument is that agents should stay in their lane, offering helpful Clippy-like suggestions and letting humans drive. But what exactly is the value in having humans grovel through logs to isolate anomalies and create hypotheses for incidents?

See also: Tool AIs Want To Be Agent AIs.

https://gwern.net/tool-ai

Predicted almost a decade ago.