| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by p1necone 74 days ago
	The "this test failure is preexisting so I'm going to ignore it" thing has been happening a lot for me lately, it's so annoying. Unless it makes a change and then immediately runs tests and it's obvious from the name/contents that the failing test is directly related to the change that was made it will ignore it and not try to fix.

4 comments

Shebanator 74 days ago

This problem has been around for a long time. Not only that but it would say this even when the problems were directly caused by their code.

I put a line in my CLAUDE.md that says "If a test doesn't pass, fix it regardless of whether it was pre-existing or in a different part of the code."

link

latentsea 74 days ago

This should be part of the system prompt. It's absolutely unacceptable to just to not at least try to investigate failures like this. I absolutely hate when it reaches this conclusion on its own and just continues on as if it's doing valid work.

link

foltik 74 days ago

Based on the recent leaks, their system prompt explicitly nudges the model not to do anything outside of what was asked. That could very well explain why it’s not fixing preexisting broken tests.

“Don't add features, refactor code, or make "improvements" beyond what was asked.”

https://www.dbreunig.com/2026/04/04/how-claude-code-builds-a...

link

hakanderyal 74 days ago

And it's very valid. Because otherwise you would ask Claude to trim a tree and it would go raze the whole forest and plant new seeds. This was the primary pain point last year, especially with Sonnet.

link

cmrdporcupine 73 days ago

Whatever prompting OpenAI has with Codex / GPT 5.4 seems superior here then.

It's very surgical and careful around incremental refactoring, etc. but it also doesn't avoid responsibility.

link

flakes 74 days ago

> "this test failure is preexisting so I'm going to ignore it"

Critical finding! You spotted the smoking gun!

link

cmrdporcupine 73 days ago

I will note that this "out" that Claude takes was a) less frequent in Opus 4.5 and that time frame and b) notably not something that Codex does.

I don't trust the code that Claude writes at all, if I have to use it (they gave me a free month recently, so I use it...) I not only review it carefully but have Codex do a thorough review.

Claude "cheats" and leaves hacks and has Dunning-Kruger.

All of this is very exhausting. I am enjoying writing my own code with these tools (to get long running personal projects out the door) but the effect that these tools are having on teams is terrifyingly corrosive and it's making me want to take an early retirement from the profession.

Yes we can write a lot of code quickly. But at what cost? And what even use is all this code now anyways?

link

dboreham 74 days ago

That said I've worked with several humans who did/said the exact same thing.

link

boesboes 74 days ago

But did they say that about tests they just added themselves too? Had claude try that on me a couple of times >_<

link

gmassman 73 days ago

Usually these were the developers who said their code didn’t need tests because it’s obviously correct/too simple to need them. And then their bug causes a crash that needs to be fixed over the weekend :/

link