| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by syllogism 415 days ago

LLM agents are very hard to talk about because they're not any one thing. Your action-space in what you say and what approach you take varies enormously and we have very little body of common knowledge about what other people are doing and how they're doing it. Then the agent changes underneath you or you tweak your prompt and it's different again.

In my last few sessions I saw the efficacy of Claude Code plummet on the problem I was working on. I have no idea whether it was just the particular task, a modelling change, or changes I made to the prompt. But suddenly it was glazing every message ("you're absolutely right"), confidently telling me up is down (saying things like "tests now pass" when they completely didn't), it even cheerfully suggested "rm db.sqlite", which would have set me back a fair bit if I said yes.

The fact that the LLM agent can churn out a lot of stuff quickly greatly increases 'skill expression' though. The sharper your insight about the task, the more you can direct it to do something specific.

For instance, most debugging is basically a binary search across the set of processes being conducted. However, the tricky thing is that the optimal search procedure is going to be weighted by the probability of the problem occurring at the different steps, and the expense of conducting different probes.

A common trap when debugging is to take an overly greedy approach. Due to the availability heuristic, our hypotheses about the problem are often too specific. And the more specific the hypothesis, the easier it is to think of a probe that would eliminate it. If you keep doing this you're basically playing Guess Who by asking "Is it Paul? Is it Anne?" etc, instead of "Is the person a man? Does the person have facial hair? etc"

I find LLM agents extremely helpful at forming efficient probes of parts of the stack I'm less fluent in. If I need to know whether the service is able to contact the database, asking the LLM agent to write out the necessary cloud commands is much faster than getting that from the docs. It's also much faster at writing specific tests than I would be. This means I can much more neutrally think about how to bisect the space, which makes debugging time more uniform, which in itself is a significant net win.

I also find LLM agents to be good at the 'eat your vegetables' stuff -- the things I know I should do but would economise on to save time. Populate the tests with more cases, write more tests in general, write more docs as I go, add more output to the scripts, etc.