|
|
|
|
|
by rmonvfer
259 days ago
|
|
The only way this might work (IMO) is writing the tests yourself (but of course, this requires you to plan and design very meticulously in advance) and doing some kind of “blind TDD” where the LLM is not able to see the tests, only run them and act on the results. Even then, I’ve had Claude (Opus 4.1) bypass tests by hardcoding conditions as it found them so I’d say reliability for this method is not 100%. Having the LLM write the tests is… well, a recipe for destruction unless you babysit it and give it extremely specific restrictions (again, I’ve done this in mid to large sized projects with fairly comprehensive documentation on testing conventions and results have been mixed: sometimes the LLM does an okay job but tests obvious things, sometimes it ignores the instructions, sometimes it hardcodes or disables conditions…) |
|
Inferring intent from plain english prompts and context is a powerful way for computers to guess what you want from underspecified requirements, but the problem of defining what you want specifically always requires you to convey some irreducible amount of information. Whether it’s code, highly specific plain english, or detailed tests, if you care about correctness they all basically converge to the same thing and the same amount of work.