| Translating from a natural language spec to code involves a truly massive amount of decision making. For a non trivial program, 2 implementations of the same natural language spec will have thousands of observable differences. Where we are today, that is agents require guardrails to keep from spinning out, there is no way to let agents work on code autonomously that won’t end up with all of those observable differences constantly shifting, resulting in unusable software. Tests can’t prevent this because for a test suite to cover all observable behavior, it would need to be more complex than the code. In which case, it wouldn’t be any easier for machine or human to understand. The only solution to this problem is that LLMs get better. Personally I think at the point they can pull this off, they can do any white collar job, and there’s not point in planning for that future because it results in either Mad Mad or Star Trek. |
I don't think "complex" is the right word here. A test suite would generally be more verbose than the implementation, but a lot of the time it can simply be a long list of input->output pairs that are individually very comprehensible and easily reviewable to a human. The hard part is usually discovering what isn't covered by the test case, rather than validating the correctness of the test cases you do have.