| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by acituan 842 days ago

Unless well separated, this will easily turn developer-hostile by some clueless management demanding high coverage and enthusiastic juniors smuggling in massive amounts of AI tests so that at the end of the day you will need get a rubberstamp from an hard-to-maintain llm-gen test code each time you want to submit your work.

Yes authoring some tests might be sped up but not necessarily maintaining them - or maintaining the code under test because you are not necessarily generating good ones. Not to mention sweating over tests usually help developers with checking the design of the code early on too; if not very testable, usually not a good design either, e.g not sufficiently abstracted component contracts which suck in a context where you need to coauthor code with others.

What some people miss is that tests are supposed to be sacrifical code, that most of which will not catch anything during their lifetime - and that is OK because it gives an automated peace of mind and saves from potential false clues when things fail. But that also means max investment into a probabilistic safeguard is not gonna pan out at all times; you will always have diminishing marginal utility as the coverage tops. Unless you're writing some high traffic part of the execution path - e.g. a standard library - touting high coverage is not gonna pay off.

Not to mention almost always an ecology of tests need be there - not just unittests but integration, system etc - to make the thing keep chugging at the end of the day. Will llm's sit at the design meetings and understand the architecture to write tests for them too? Or what they can do will be oversold at the expense of what should be done. A sense of "what is relevant" is needed while investing effort in tests - not just at write-time but also at design-time and maintain-time - which is what humans are pretty OK at, and AI tools are not.

What llms can save time with is keystrokes of an experienced developer who already has a sense of what is a good thing to test and what is not. It can also be - and has been - a hinderance with making the developers smuggle not-so-relevant things into the code.

I don't want an economy of producing keystrokes, I want an appropriately thought set of highly relevant out keystrokes, and I want the latter well separated from the former so that their objective utility - or lack thereof - can be demonstrated in time.