| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vips7L 24 days ago
	Aren’t LLMs notorious for just making tests pass and not actually testing functionality?

3 comments

mcintyre1994 23 days ago

I’ve never seen Claude do that. It makes the new tests pass by fixing previously unknown bugs in my experience.

link

bcrosby95 23 days ago

I had it do it about a month ago. It changed test data which caused another test to fail and instead of isolating things it decided to flip an assert.

link

cyclopeanutopia 23 days ago

That's because Opus needed vacation and they routed your requests to its less sophisticated cousin, Claude Dynamite. ;)

link

weird-eye-issue 23 days ago

I love Claude but on several occasions I've had it do some really funky stuff just to get tests passing

link

senordevnyc 23 days ago

Yeah, in 2024.

link

cinntaile 23 days ago

You have to keep an eye on them, but they don't just make tests pass.

link

kdjkskdndn 23 days ago

Claude sonnet 4 (this time last year) did do this. It once made simulation if a test script passing. Literally a script that just echoed test names and then said pass.

link

cinntaile 23 days ago

Change happens fast, a year old model is pretty outdated.

I'm sure it can happen, hence why I said to keep an eye out. Its main mode of operation is not to cook the tests however.

link

yw3410 23 days ago

Happened to me, 3 days ago - deleted some tests and flipped assertions after outlining that it wasn't to change any assertions.

Our team was doing a similar task to move between test frameworks, and I had to do a git diff of hundreds of thousands of lines to try and work out where a test had disappeared to.

link

LtWorf 23 days ago

> 3 days ago

Your fault. You should have used a model from 0.000005 seconds ago!

link

cinntaile 23 days ago

Reading is difficult.

link

customguy 23 days ago

> Change happens fast, a year old model is pretty outdated.

What change? That you should not fake the results of a test because that defeats the whole purpose of a test has been known before there were computers.

link

cinntaile 23 days ago

I don't know, the weather?

link