|
|
|
|
|
by gck1
299 days ago
|
|
LLMs (Sonnet, Gemini from what I tested) tend to “fix” failing tests by either removing them outright or tweaking the assertions just enough to make them pass. The opposite happens too - sometimes they change the actual logic when what really needs updating is the test. In short, LLMs often get confused about where the problem lies: the code under test or the test itself. And no amount of context engineering seems to solve that. |
|
Without providing the actual feature requirements to the LLM(or the developer) it is impossible to determine which is wrong.
Which is why I think it is also sort of stupid by having the LLM generate tests by just giving it access to the implementation. That is at best testing the implementation as it is, but tests should be based on the requirements.