| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by majormajor 24 days ago

I've gotten a lot of value out of LLM tools but without extensive feedback and direction I've found even the newer version of Opus pretty bad at writing good tests. First drafts are full of tests with some of these characteristics:

- good test, wrong layer, would turn into a mess of "wait why'd tests over there blow up for changes over here"

- mostly-good test, subtle issue (yes, this is the status quo with most human-written tests, but the risk of not being careful is that you (or the agent you throw at crap) now is overconfident in your future changes)

- weird silly test like "assert that calling the six statements in this method in the right order do the right thing" without... actually calling the method itself... so don't protect against changes to the method?

For non-greenfield work I've recently been much more happy with their out of the box code change quality then their attempts at adding coverage for those changes!

I think this is an area that has remained hard because putting directives in CLAUDE.md or whatnot for tests is generally gonna be so generic to be useless, like "put tests in the right place" without more module-specific context. Whereas if I'm making a non-greenfield change, I'm thinking much more in my prompt about constraints on the code itself, and much less about the current shape/state/organization of the tests or what to direct it on.

Properly used it's great. Definitely improved my test coverage a lot.

But it's entirely still in the world of "people who'd care to write good code before will write good code faster; other people will just write mediocre code faster."