| > The Claude C Compiler illustrates the other side: it optimizes for > passing tests, not for correctness. It hard-codes values to satisfy > the test suite. It will not generalize. This is one of the pain points I am suffering at work: workers ask coding agents to generate some code, and then to generate test coverage for the code. The LLM happily churns out unit tests which are simply reinforcing the existing behaviour of the code. At no point does anyone stop and ask whether the generated code implements the desired functional behaviour for the system ("business logic"). The icing on the cake is that LLMs are producing so much code that humans are just rubber stamping all of it. Off to merge and build it goes. I have no constructive recommendations; I feel the industry will keep their foot on the pedal until something catastrophic happens. |