You should try switching it up. Write the tests and then ask the LLM to write the code that makes them pass. I find I'm more likely to learn something in this mode.
I'd argue having useable LLMs kind of brings out how problematic TDD is.
Imagine the dumbest function you have to write: a product A and a street address as input, and the shipping cost as an output.
How many test cases would you write to be absolutely sure that function actually does what you want it to do, and be confident it doesn't have weird exceptions that the LLM injected randomly ? I'd assume you'd still vet the code written by the LLM, but if it's hundreds of rambling lines doing weird stuff to get the right result, is it really faster than writing it yourself ?
If it's hundreds of rambling lines then I'm not going to be able to get it past my linter anyhow (complexity thresholds), nor am I going to be able to get it past my team when they review it. So yeah, that's a problematic case, but it's one I'm going to have to refactor to avoid with or without an LLM in the loop.
TDD works best if you default to testing at the outer shell of the app - e.g. translating a user story into steps executed by playwright against your web app and only TDDing lower layers once youve used those higher level tests to evolve a useful abstraction underneath the outer shell.
It seems to be taught in a fucked up way though where you imagine you want a car object and a banana object and you want to insert the banana into a car or some other kind of abstract nonsense.
I don't know what normally is, but I'd say it works pretty well.
Often the challenge is that the context for what you're trying to do is sprawling. There's just too many files and they're all too long: you end up exceeding the context window or filling it with 99% irrelevant stuff. Typically the structures you build for tests are smaller and more focused on the particular instance you're worried about, which I think is a better way to talk to an LLVM.
You don't have to explain, for instance, that there's data in production which doesn't match the schema in the code so it must be cautious to avoid running afoul of that difference. Instead you've mocked that data, so it's right there in the same code with the test that it's trying to make pass.
Imagine the dumbest function you have to write: a product A and a street address as input, and the shipping cost as an output.
How many test cases would you write to be absolutely sure that function actually does what you want it to do, and be confident it doesn't have weird exceptions that the LLM injected randomly ? I'd assume you'd still vet the code written by the LLM, but if it's hundreds of rambling lines doing weird stuff to get the right result, is it really faster than writing it yourself ?