Hacker News new | ask | show | jobs
by vips7L 24 days ago
Aren’t LLMs notorious for just making tests pass and not actually testing functionality?
3 comments

I’ve never seen Claude do that. It makes the new tests pass by fixing previously unknown bugs in my experience.
I had it do it about a month ago. It changed test data which caused another test to fail and instead of isolating things it decided to flip an assert.
That's because Opus needed vacation and they routed your requests to its less sophisticated cousin, Claude Dynamite. ;)
I love Claude but on several occasions I've had it do some really funky stuff just to get tests passing
Yeah, in 2024.
You have to keep an eye on them, but they don't just make tests pass.
Claude sonnet 4 (this time last year) did do this. It once made simulation if a test script passing. Literally a script that just echoed test names and then said pass.
Change happens fast, a year old model is pretty outdated.

I'm sure it can happen, hence why I said to keep an eye out. Its main mode of operation is not to cook the tests however.

Happened to me, 3 days ago - deleted some tests and flipped assertions after outlining that it wasn't to change any assertions.

Our team was doing a similar task to move between test frameworks, and I had to do a git diff of hundreds of thousands of lines to try and work out where a test had disappeared to.

> 3 days ago

Your fault. You should have used a model from 0.000005 seconds ago!

Reading is difficult.
> Change happens fast, a year old model is pretty outdated.

What change? That you should not fake the results of a test because that defeats the whole purpose of a test has been known before there were computers.

I don't know, the weather?