| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 112233 231 days ago

Is claude code with both Sonnet and Opus agentic enough? Because it is constantly finding creative ways to ignore direct, repeated instructions ("user asked X but it is hard, let's do Y instead"), implement fake tests ("feature X is complex. we need to test it completely. let's write script that will create files that feature X would have created, then test that files exist"), sabotage and delete working code ("we need to track FD of the open file (runs strace). The FD is 5 (hardcodes 5 in the code instead of implementing anything useful) tests pass now!")

I have not experienced the level of malice and sweet-talking work avoidance from anyone. It apologizes like an alcoholic, then proceeds doubling down.

Can you force it to produce actually useful code? Yes, by repeatedly yelling at it to please follow the instructions. In the process, it will break, delete, or implement hard to find bugs in rest of the codebase.

I'm really curious, if anyone actually has this thing working, or they simply haven't bothered to read the generated code

2 comments

z33k 230 days ago

You need to use the features that Claude Code gives you in order to be successful with it. Your build and tests should be in a Stop hook that prevent Claude from stopping if the build or tests fail. Combining this with a Stop hook that bails out if the first hook failed n times already prevents infinite loops.

With anything above a toy project, you need to be really good with context window management. Usually this means using subagents and scoping prompts correctly by placing the CLAUDE.md files next to the relevant code. Your main conversation's context window usage should pretty much never be above 50%. Use the /clear command between unrelated tasks. Consider if recurring sequences of tool calls could be unified into a single skill.

Instead of sending instructions to the agent straight away, try planning with it and prompting it to ask your questions about your plan. The planning phase is a good place to give Claude more space to think with "think > think hard > ultrathink". If you are still struggling with the agent not complying, try adding emplasis with "YOU MUST" or "IMPORTANT".

wvenable 230 days ago

As I'm getting better and better results with it, I'm having it do more and more things. I went through a complete agentic refactor of a project from Angular 17 to Angular 20 (RxJS to Signals) and I'd say it did it perfectly. A few times I'd get it summarize and start a new chat because it can start to get less effective when the history gets too long. I also had to iterate on what I wanted and do things a step a time. Although it was very clear that it also wanted to do things in pieces and test each major change before continuing on.

I think like any tool it's has it's pros and cons and the more you use it the more you figure out how to make the best use out of it and when to give up.