| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by koreth1 100 days ago

I wish I had this kind of experience. I threw a tedious but straightforward task at Claude Code using Opus 4.6 late last week: find the places in a React code base where we were using useState and useEffect to calculate a value that was purely dependent on the inputs to useEffect, and replace them with useMemo. I told it to be careful to only replace cases where the change did not introduce any behavior changes, and I put it in plan mode first.

It gave me an impressive plan of attack, including a reasonable way to determine which code it could safely modify. I told it to start with just a few files and let me review; its changes looked good. So I told it to proceed with the rest of the code.

It made hundreds of changes, as expected (big code base). And most of them were correct! Except the places where it decided to do things like put its "const x = useMemo(...)" call after some piece of code that used the value of "x", meaning I now had a bunch of undefined variable references. There were some other missteps too.

I tried to convince it to fix the places where it had messed up, but it quickly started wanting to make larger structural changes (extracting code into helper functions, etc.) rather than just moving the offending code a few lines higher in the source file. Eventually I gave up trying to steer it and, with the help of another dev on my team, fixed up all the broken code by hand.

It probably still saved time compared to making all the changes myself. But it was way more frustrating.

7 comments

dcre 100 days ago

One tip I have is that once you have the diff you want to fix, start a new session and have it work on the diff fresh. They’ve improved this, but it’s still the case that the farther you get into context window, the dumber and less focused the model gets. I learned this from the Claude Code team themselves, who have long advised starting over rather than trying to steer a conversation that has started down a wrong path.

I have heard from people who regularly push a session through multiple compactions. I don’t think this is a good idea. I virtually never do this — when I see context getting up to even 100k, I start making sure I have enough written to disk to type /new, pipe it the diff so far, and just say “keep going.” I learned recently that even essentials like the CLAUDE.md part of the prompt get diluted through compactions. You can write a hook to re-insert it but it's not done by default.

This fresh context thing is a big reason subagents might work where a single agent fails. It’s not just about parallelism: each subagent starts with a fresh context, and the parent agent only sees the result of whatever the subagent does — its own context also remains clean.

kjohanson 100 days ago

Yeah, I start most of my sessions now with “read the diff between this branch and main”. Seems like it grounds and focuses it.

eru 100 days ago

Slight tangent: you want to read the diff between your branch and the merge-base with origin/main. Otherwise you get lots of spurious spam in your diff, if main moved since you branched off.

dcre 99 days ago

In jj this is jj diff -f ‘fork_point(trunk() | @)’. I have an alias for it.

eru 99 days ago

What's jj? In Git I also have an alias for diffing against the merge-base. (It's also what GitHub gives you by default in the webview.)

nextaccountic 100 days ago

One thing that seems important is to have the agent write down their plan and any useful memory in markdown files, so that further invocations can just read from it

Glyptodon 100 days ago

IMO it seems to start "forgetting" or "overlooking" claude.md well before the context window is full.

sidrag22 100 days ago

subagents are huge, could execute on a massive plan that should easily fill up a 200k context window and be done atnaround 60k for the orchestration agent.

as a cheapass, being able to pass off the simple work to cheaper $ per token agents is also just great. I've got a handful of tasks I can happily delegate work to a haiku agent and anything requiring a bit of reasoning goes to sonnet.

Feel like opus is almost a cheatcode when i do get stuck, i just bust out a full opus workflow instead and it just destroys everything i was struggling with usually. like playing on easy mode.

as cool as this stuff is, kinda still wish i was just grandfathered into the plan with no weekly limit and only the 5 hour window limits, id just be happily hammering opus blissfully.

ramesh31 100 days ago

>"This fresh context thing is a big reason subagents might work where a single agent fails. It’s not just about parallelism: each subagent starts with a fresh context, and the parent agent only sees the result of whatever the subagent does — its own context also remains clean."

This is the true power of agent teams: https://code.claude.com/docs/en/agent-teams

You maintain very low context usage in the main thread; just orchestration and planning details, while each individual team member remains responsible for their own. Allows you to churn through millions of output tokens in a fraction of the time.

olalonde 100 days ago

Same here. I don't understand how people leave it running on an "autopilot" for long periods of time. I still use it interactively as an assistant, going back and forth and stepping in when it makes mistakes or questionable architectural decisions. Maybe that workflow makes more sense if you're not a developer and don't have a good way to judge code quality in the first place.

There's probably a parallel with the CMSes and frameworks of the 2000s (e.g. WordPress or Ruby on Rails). They massively improved productivity, but as a junior developer you could get pretty stuck if something broke or you needed to implement an unconventional feature. I guess it must feel a bit similar for non-developers using tools like Claude Code today.

ramesh31 100 days ago

>Same here. I don't understand how people leave it running on an "autopilot" for long periods of time.

Things have changed. The models have reached a level of coherence that they can be left to make the right decisions autonomously. Opus 4.6 is in a class of its own now.

devld 100 days ago

A non-technical client of mine has built an entire app with a very large feature set with Opus. I declined to work on it to clean it up, I was afraid it would have been impossible and too much risk. I think we are at a level where it can build and auto-correct its mistakes, but the code is still slop and kind of dangerous to put in production. If you care about the most basic security.

conception 100 days ago

Branch first so you can just undo. I think this would have worked with sub agents and /loop maybe? Write all items to change to a todo.md. Have it split up the work with haiku sub agents doing 5-10 changes at a time, marking the todos done, and /loop until all are done. You’ll succeed I suspect. If the main claude instance compacts its context - stop and start from where you left off.

koreth1 100 days ago

It actually did automatically break the work up into chunks and launched a bunch of parallel workers to each handle a smaller amount of work. It wasn't doing everything in a single instance.

The problem wasn't that it lost track of which changes it needed to make, so I don't think checking items off a todo list would have helped. I believe it did actually change all the places in the code it should have. It just made the wrong changes sometimes.

But also, the claim I was responding to was, "I start with a PRD, ask for a step-by-step plan, and just execute on each step at a time." If I have to tell it how to organize its work and how to keep track of its progress and how to execute all the smaller chunks of work, then I may get good results, but the tool isn't as magical (for me, anyway) as it seems to be for some other people.

monkpit 100 days ago

The next line in the comment you’re responding to is

> Sometimes ideas are dumb, but checking and guiding step by step helps it ship working things in hours.

which matches my experience exactly. I consider it to be about as magical as the parent comment is claiming, but I wouldn’t call it totally automatic.

a13n 100 days ago

If you use eslint and tell it how to run lint in CLAUDE.md it will run lint itself and find and fix most issues like this.

Definitely not ideal, but sure helps.

jdkoeck 100 days ago

Undefined variable references? Did you not instruct it to run typescript after changes?

stpedgwdgfhgdd 100 days ago

Start over, create a new plan with the lessons learned.

You need to converge on the requirements.

dyauspitr 100 days ago

You’re using it wrong. As soon as it starts going off the rails once you’ve repeated yourself, you drop the whole session and start over.

saghm 100 days ago

One of the more subtle points that seems to be crucial is that it works a lot better when it can use the context as part of its own work rather than being polluted by unrelated details. Even better than restarting when it's off the rails is to avoid it as much as possible by proactively starting a new conversation as soon as anything in the history of the existing one stops being relevant. I've found it more effective to manually tell it most what's currently in the context in a fresh session skip the irrelevant bits even if they're fairly small than relying on it to figure out that it's no longer relevant (or give it instructions indicating that, which feels like a crapshoot whether it's actually going to prune or just bloat things further with that instruction just being added into the mix).

To echo what the parent comment said, it's almost frustrating how effective it can be at certain tasks that I wouldn't ever have the patience for. At my job recently I needed to prototype calling some Python code via WASM using the Rust wasmtime engine, and setting up the code structure to have the bytes for the WASM component, the arguments I wanted to pass to the function, and the WIT describing the interface for the function, it was able to fill in all of the boilerplate needed so that the function calls worked properly within a minute or two on the first try; reading through all the documentation and figuring out how exactly which half dozen assorted things I had to import and hook up together in the correct order would have probably taken me an hour at minimum.

I don't have any particular insight on whether or not these tools will become even more powerful over time, and I still have fairly strong concerns about how AI tools will affect society (both in terms of how they're used and the amount of in energy used to produce them in the first place), but given how much the tech industry tends to prioritize productivity over social concerns, I have to assume that my future employment is going to be heavily impacted by my willingness to adopt and use these tools. I can't deny at this point that having it as an option would make me more productive than if I refuse to use it, regardless of my personal opinions on it.