|
I can't believe we're back to advocating for TDD. It was a failed paradigm that last few times we tried it. This time isn't any different because the fundamental flaw has always been the same: tests aren't proofs, they don't have complete coverage. Before anyone gets too confused, I love tests. They're great. They help a lot. But to believe they prove correctness is absolutely laughable. Even the most general tests are very narrow. I'm sure they help LLMs just as they help us, but they're not some cure all. You have to think long and hard about problems and shouldn't let tests drive your development. They're guardrails for checking bonds and reduce footguns. Oh, who could have guessed, Dijkstra wrote about program completeness. (No, this isn't the foolishness of natural language programming, but it is about formalism ;) https://www.cs.utexas.edu/~EWD/transcriptions/EWD02xx/EWD288... |
The price you pay for tests is that they need to be written and maintained. Writing and maintaining code is much more expensive than people think.
Or at least it used to be. Writing code with claude code is essentially free. But the defect rate has gone up. This makes TDD a better value proposition than ever.
TDD is also great because claude can fix bugs autonomously when it has a clear failing test case. A few weeks ago I used claude code and experts to write a big 300+ conformance test suite for JMAP. (JMAP is a protocol for email). For fun, I asked claude to implement a simple JMAP-only mail server in rust. Then I ran the test suite against claude's output. Something like 100 of the tests failed. Then I asked claude to fix all the bugs found by the test suite. It took about 45 minutes, but now the conformance test suite fully passes. I didn't need to prompt claude at all during that time. This style of TDD is a very human-time efficient way to work with an LLM.