This is why you write the tests first and then the code. Especially when fixing bugs, since you can be sure that the test properly fails when the bug is present.
When fixing bugs, yes. When designing an app not so much because you realize many unexpected things while writing the code and seeing how it behaves. Often the original test code would test something that is never built. It's obvious for integration tests but it happens for tests of API calls and even for unit tests. One could start writing unit tests for a module or class and eventually realize that it must be implemented in a totally different way. I prefer experimenting with the implementation and write tests only when it settles down on something that I'm confident it will go to production.
Where I'm at currently (which may change) is that I lay down the foundation of the program and its initial tests first. That initial bit is completely manual. Then when I'm happy that the program is sufficiently "built up", I let the LLM go crazy. I still audit the tests though personally auditing tests is the part of programming I like the very least. This also largely preserves the initial architectural patterns that I set so it's just much easier to read LLM code.
In a team setting I try to do the same thing and invite team members to start writing the initial code by hand only. I suspect if an urgent deliverable comes up though, I will be flexible on some of my ideas.
One thing I want to mention here is that you should try to write a test that not only prevents this bug, but also similar bugs.
In our own codebase we saw that regression on fixed bugs is very low. So writing a specific test for it, isn't the best way to spend your resources. Writing a broad test when possible, does.
Not sure how LLM's handle that case to come up with a proper test.
I'd argue the AI writing the tests shouldn't even know about the implementation at all. You only want to pass it the interface (or function signatures) together with javadocs/docstrings/equivalent.
Writing the tests first and then writing code to pass the tests is no better than writing the code first then writing tests that pass. What matter is that both the code and the tests are written independently, from specs, not from one another.
I think that it is better not to have access to tests when first writing code, as to make sure to code the specs and not code the tests that test the specs as something may be lost in translation. It means that I have a preference for code first, but the ideal case would be for different people to do it in parallel.
Anyway, about AI, in an AI writes both the tests and the code, it will make sure they match no matter what comes first, it may even go back and forth between the tests and code, but it doesn't mean it is correct.
Tests are your spec. You write them first because that is the stage when you are still figuring out what you need to write.
Although TDD says that you should only write one test before implementing it, encouraging spec writing to be an iterative process.
Writing the spec after implementation means that you are likely to have forgotten the nuance that went into what you created. That is why specs are written first. Then the nuance is captured up front as it comes to mind.
Tests are not any more or any less of a spec than the code. If you are implementing a HTTP server for instance, RFC 7231 are your specs, not your tests, not your code.
I would say that which come first between specs and code depend on the context. If you are implementing a standard, the specs of the standard obviously come first, but if you are iterating, maybe for a user interface, it can make sense to start with the code so that you can have working prototypes. You can then write formal documents and tests later, when you are done prototyping, for regression control.
But I think that leaning on tests is not always a good idea. For example, let's continue with the HTTP server. You write a test suite, but there is a bug in your tests, I don't know, you confuse error 404 and 403. The you write your code, correctly, run the tests, see that one of your tests fail and tell you have returned 404 and not 403. You don't think much, after all "the tests are the specs", and change the code. Congratulations, you are now making sure your code is wrong.
Of course, the opposite can and do happen, writing the code wrong and making passing test without thinking about what you actually testing, and I believe that's why people came up with the idea of TDD, but for me, test-first flip the problem but doesn't solve it. I'd say the only advantage, if it is one, is that it prevents taking a shortcut and releasing untested code by moving tests out of the critical path.
But outside of that, I'd rather focus on the code, so if something are to be "the spec", that's it. It is the most important, because it is the actual product, everything else is secondary. I don't mean unimportant, I mean that from the point of view of users, it is better for the test suite to be broken than for the code to be broken.
It is more like a meta spec. You still have to write a final spec that applies to your particular technical constraints, business needs, etc. RFC 7231 specifies the minimum amount necessary to interface with the world, but an actual program to be deployed into the wild requires much, much more consideration.
And for that, since you have the full picture not available to a meta spec, logically you will write it in a language that both humans and computers can understand. For the best results, that means something like Lean, Rocq, etc. However, in the real world you likely have to deal with middling developers straight out of learn to code bootcamps, so tests are the practical middle ground.
> I don't know, you confuse error 404 and 403.
Just like you would when writing RFC 7231? But that's what the RFC process is for. You don't have to skip the RFC process just because the spec also happens to be machine readable. If you are trying to shortcut the process, then you're going to have this problem no matter what.
But, even when shortcutting the process, it is still worthwhile to have written your spec in a machine-readable format as that means any changes to the spec automatically identify all the places you need to change in implementation.
> writing the code wrong and making passing test without thinking about what you actually testing
The much more likely scenario is that the code is right, but a mistake in the test leads it to not test anything. Then, years down the road after everyone has forgotten or moved on, when someone needs to do some refactoring there is no specification to define what the original code was actually supposed to do. Writing the test first means that you have proven that it can fail. That's not the only reason TDD suggests writing a test first, but it is certainly one of them.
> It is the most important, because it is the actual product
Nah. The specification is the actual product; it is what lives for the lifetime of the product. It defines the contract with the user. Implementation is throwaway. You can change the implementation code all day long and as long as the user contract remains satisfied the visible product will remain exactly the same.
> The much more likely scenario is that the code is right, but a mistake in the test leads it to not test anything.
What I usually do to prevent this situation is to write a passing test, then modify the code to make it fail, then revert the change. It also gives an occasion to read the code again, kind of like a review.
I have never seen this practice formalized though, good for me, this is the kind of things I do because I care, turning it into a process with Jira and such is a good way to make me stop caring.
Also, if you find after implementation that the spec wasn't specific enough, go ahead and refresh the spec and have the LLM redo the code, from scratch if necessary. Writing code is so cheap right now, it takes a different mindset in general.
Agreed 1000%. But that can be a lot of work; creating a good set of tests is nearly as much or often even more effort than implementing the thing being tested.
When LLMs can assist with writing useful tests before having seen any implementation, then I’ll be properly impressed.
from experience, AI is bad at TDD. they can infer tests based on written code, but are bad at writing generalised test unless a clear requirement is given, so you the engineer is doing most of the work anyway.
My day job has me working on code that is split between two different programming languages. I'd say LLMs are pretty good at TDD in one of those languages and a hot mess in the other.
Which, funny enough, is a pretty good reflection of how I thought of the people writing in those languages before LLMs: One considers testing a complete afterthought and in the wild it is rare to find tests at all, and when they are present they often aren't good. Whereas the other brings testing as a first-class feature and most codebases I've seen generally contain fairly decent tests.