| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vadansky 358 days ago

I had a particularly hard parsing problem so I setup a bunch of tests and let the LLM churn for a while and did something else.

When I came back all the tests were passing!

But as I ran it live a lot of cases were still failing.

Turns out the LLM hardcoded the test values as “if (‘test value’) return ‘correct value’;”!

6 comments

ffsm8 358 days ago

Missed opportunity for the LLM, could've just switched to Volkswagen CI

https://github.com/auchenberg/volkswagen

link

EGreg 358 days ago

This is gold lol

link

artursapek 358 days ago

lmfao

link

mikeocool 358 days ago

Yeah — I had something like this happen as well — the llm wrote a half decent implementation and some good tests, but then ran into issues getting the tests to pass.

It then deleted the entire implementation and made the function raise a “not implemented” exception, updated the tests to expect that, and told me this was a solid base for the next developer to start working on.

link

bluefirebrand 358 days ago

This is the most accurate Junior Engineer behavior I've heard LLMs doing yet

link

vunderba 358 days ago

I've definitely seen this happen before too. Test-driven development isn't all that effective if the LLM's only stated goal is to pass the tests without thinking about the problem in a more holistic/contextual manner.

link

matsemann 358 days ago

Reminds me of trying to train a small neural net to play Robocode ~10+ years ago. Tried to "punish" it for hitting walls, so next morning I had evolved a tanks that just stood still... Then punished it for standing still, ended up with a tanks just vibrating, alternating moving back and forth quickly, etc.

link

vunderba 358 days ago

That's great. There's a pretty funny example of somebody training a neural net to play Tetris on the Nintendo entertainment system, and it quickly learned that if it was about to lose to just hit pause and leave the game in that state indefinitely.

link

amlib 358 days ago

I guess it came to the same conclusion as the computer in War Games, "The only way to win is not to play"

link

insane_dreamer 358 days ago

While I haven't run into this egregious of an offense, I have had LLMs either "fix" the unit test to pass with buggy code, or, conversely, "fix" the code to so that the test passes but now the code does something different than it should (because the unit test was wrong to start with).

link

FuckButtons 358 days ago

Seems like property based tests would be good for llms, it’s a shame that half the time coming up with a good property test can be as hard as writing the code.

link