Hacker News new | ask | show | jobs
by simonw 68 days ago
> I hated writing software this way. Forget the output for a moment; the process was excruciating. Most of my time was spent reading proposed code changes and pressing the 1 key to accept the changes, which I almost always did. [...]

That's why they hated it. Approving every change is the most frustrating way of using these tools.

I genuinely think that one of the biggest differences between people who enjoy coding agents and people who hate them is whether or not they run in YOLO mode (aka dangerously-skip-permissions). YOLO mode feels like a whole different product.

I get the desire not to do that because you want to verify everything they do, but you can still do that by reviewing the code later on without the pain of step-by-step approvals.

8 comments

>reviewing the code later on without step-by-step approvals

I found that Claude likes to leave some real gems in there if you get lazy and don't check. Gently sprinkled in between 100 lines of otherwise fine looking code that sows doubt into all of the other lines it's written. Sometimes it makes a horrific architectural decision and if it doesn't get caught right there it's catastrophic for the rest of the session.

or it casually forgets to implement some requirements, which one finds out about when the program runs, hits that pathway, and either crashes or does nothing.
Are you not giving it enough information to work with? All of these issues you and the parent comment mentioned can be worked around by telling it HOW to do things.
The whole shtick of LLMs is that it can do stuff without telling it explicitly. Not sure why people are blamed because they are using it based on that expectation....
Yes, it can. So can I. But neither of us will write the code exactly the way nitpicky PR reviewer #2 demands it be written unless he makes his preferences clear somewhere. Even at a nitpick-hellhole like Google that's mostly codified into a massive number of readability rules, which can be found and followed in theory. Elsewhere, most reviewer preferences are just individual quirks that you have to pick up on over time, and that's the kind of stuff that neither new employees nor Claude will ever possibly be able to get right in a one-shot manner.
Sure, but that is not what the OP talks about.
There is an unconstrained number of ways it can write code and still not be how I want it. Sometimes it's easier to write the correction against the code that is already generated since now you at least have a reference to something there than describing code that doesn't yet exist. I don't think it's solvable in general until they have the neuralink skill that senses my approval as it materializes each token and autocorrects to the golden path based on whether I'm making a happy or frowny face.
Stop thinking like a programmer and start thinking like a business person. Invest time and energy in thinking about WHAT you want; let the LLM worry about the HOW.
The thing is that the HOW of today becomes the context of someone else's tomorrow session, that person may not be as knowledgeable about that particular part of the codebase (and the domain), their LLM will base its own solution on today's unchecked output and will, inevitably, stray a little bit further from the optimum. So far I haven't seen any mechanism and workflow that would consistently push in the opposite direction.
>let the LLM worry about the HOW.

You mean, let the LLM hallucinate about the HOW...

you can tell it how to do things, but sometimes it still goes out on its own, I have some variant of "do not deviate from the plan" and yet sometimes if you look while it's coding it will "ah, this is too hard as per the plan, let me take this shortcut" or "this previous test fails, but it's not an issue with my code I just wrote, so let's just 'fix' the test"

For simple scripts and simple self contained problems fully agenting in yolo mostly works, but as soon as it's an existing codebase or plans get more complex I find I have to handhold claude a lot more and if I leave it to its own devices I find things later. I have found also that having it update the plan with what it did AND afterwards review the plan it will find deviations still in the codebase.

Like the other day I had in the plan to refactor something due to data model changes, specifying very clearly this was an intentional breaking change (greenfield project under development), and it left behind all the existing code to preserve backwards compatibility, and actually it had many code contortions to make that happen, so much so I had to redo the whole thing.

Sometimes it does feel that Anthropic turns up/down the intelligence (I always run opus in high reasoning) but sometimes it seems it's just the nature of things, it is not deterministic, and sometimes it will just go off and do what it thinks it's best whether or not you prompt it not to (if you ask it later why it did that it will apologize with some variation of well it made sense at the time)

Technically that's true, but unless you literally write every single line of code, the LLM will find a way to smuggle in some weirdness. Usually it isn't that bad, but it definitely requires quite a lot of attention.
There is a point where telling it how to do stuff is comparable/more effort to just doing it yourself.
> I get the desire not to do that because you want to verify everything they do, but you can still do that by reviewing the code later on without the pain of step-by-step approvals.

It's a well-known truth in software development that programmers hate having to maintain code written by someone else. We see all the ways in which they wrote terrible code, that we obviously would never write. (In turn, the programmers after us will do the same thing to our code.)

Having to get into the mindset of the person writing the code is difficult and tiring, but it's necessary in order to realise why they wrote things the way they did - which in turn helps you understand the problems they were solving, and why the code they wrote actually isn't as terrible in context as it looked at first glance.

I think it makes sense that this would also apply to the use of generative AI when programming - reviewing the entire codebase after it's already been written is probably more error-prone and difficult than following along with each individual step that went into it, especially when you consider that there's no singular "mindset" you can really identify from AI-generated output. That code could have come from anywhere...

I think that those permissions are largely security theater anyway.

It would be better if an LLM coding harness just helped you set up a proper sandbox for itself (containers, VMs etc.) and then run inside the isolated environment unconstrained.

In setup mode, the only tool accessible to the agent should be running shell scripts, and each script should be reviewed before running.

Inside an isolated environment, there should be no permission system at all.

I'm legitimately curious - could you elaborate on the difference? Speaking as someone who has always preferred the commit-by-commit focus of a rebase instead of all-at-once merge conflict resolution, auditing all the changes together later doesn't sound more appealing than doing things incrementally.
It's far more sane to review a complete PR than to verify every small change. They are like dicey new interns - do you want to look over their shoulder all day, or review their code after they've had time to do some meaningful quantum of work?
> It's far more sane to review a complete PR than to verify every small change.

Especially when the harness loop works if you let it work. First pass might have syntax issues. The loop will catch it, edit the file, and the next thing pops up. Linter issues. Runtime issues. And so on. Approving every small edit and reading it might lead to frustrations that aren't there if you just look at the final product (that's what you care about, anyway).

The main difference in the current (theatrical) permission model is that the agent is blocked on waiting for your approval. So you can't just launch it and go do something else, because when you return you will see that nothing is done and it has just been waiting for your input all this time. You have to stare at the screen and do nothing, which is a really boring and unproductive way to spend time.

If you launch it in YOLO mode in a separate branch in a separate worktree (or, preferably, in total isolation), you can instead spend time reviewing changes from previous tasks or refining requirements for new tasks.

The choice isn't really between all at once and line by line. I always use accept all changes, but I make commits that I can review and consider in bigger pieces, but usually smaller than the full PR.
I think it's too far to say you need YOLO mode — the author was correctly pointing to the "auto-accept all changes" setting. They should have just turned that on and then reviewed the changes in larger chunks. You don't have to let it go for half an hour and review the mess it cooked up — you can keep an eye on things and even manually make commits to break the work into logical pieces.

With auto-accept edits plus a decent allowlist for common commands you know are safe, the permission prompts you still get are much more tolerable. This does prevent you from using too many parallel agents at a time, since you do have to keep an eye on them, but I am skeptical of people using more than 3-5 anyway. Or at least, I'm sure there is work amenable to many agents but I don't think most software engineering is like that.

All that said, I am reaching the point where I'm ready to try running CC in a VM so I can go full YOLO.

Even if you don't want to do yolo mode, there are things like Copilot Autopilot or you can make the permissions for Claude so wide that they can work for an hour and let you come back to the artifact after lunch.
Yesterday I had it get the length of a word in characters by doing `word.len()`. In Rust. In 2026. Using Opus.

This again showed me that I can't go in YOLO mode. Things like this are disastrous if left to fester in a codebase.

Eh… I get what you’re saying but the word “character” is super overloaded. C uses “char” to mean “byte”. Rust uses it to mean “Unicode scalar” (which still isn’t a user-perceived character.) The meaning that corresponds to “where should the caret move when I press the arrow keys in a text editor” turns out to only be meaningful in a tiny set of circumstances. The vast, vast, vast majority of the time, it doesn’t make sense to think about “characters” at all, and it’s just bytes you need to account for. I’m generally with you on AI needing serious review from knowledgeable humans or it can be a disaster, but “it misunderstood what I meant by characters” smells a lot more like you were unclear in your prompt.
That's the thing. I didn't ask it about how to get to the width of the string.

It came up with a plan and I tried it.

...and then you get "the agent just git resetted --hard 12 hours of my work!", because AI bros can't be bothered to make their tooling actually good and version the changes at filesystem level, because it needs more than putting another variation of "pretty please don't break things" in the prompt.