Hacker News new | ask | show | jobs
by barakm 1669 days ago
Another “stupid senior” here and I say go for 75-80%. It’s almost exactly a classic Pareto situation in my book.

Parent post is right that lower coverage degrades rapidly... the difference between 65% and 75% is huge. But the ancestor post is right that there’s large diminishing returns too.

I’ll qualify this with recommending leaning hard into lints and type checkers and the like. Eliminating whole classes of errors gives the edge; rather than writing test cases in, say, Python to ensure that a string-mangling function raises the “correct” exception if passed an int… just enforce mypy checking instead. And then get your type coverage up to 75-80 percent. Fuzzers too. Get more overall coverage by letting the computer do the work.

1 comments

I'm a stupid principal and I go for 100%... when I can.

When I can't, I ponder.

I see how my design can be aided by coverage and ask "Hey, I don't have coverage for this, why did I write it?"

It's a worthwhile question. I wrote code, but it is hard to exercise easily... why did I write it?

Don't get me wrong, there is a bunch of stupid shit to contend with (I'm looking at you MessageDigest AlgorithmNotFound)

Of course I prefer 100% coverage with well written unit tests. But I’d also prefer to ship software and create value. There is a point of diminishing return in trying to exercise all code paths. Guessing at some hard rule (70%, no 80%!) seems futile as it varies depending on your own work. Experience tends to guide us to critical paths that should be tested at 100% and code generated RPC stubs that we may skip as an example. This is a trade off we make like so many in this field.

Edit: we may be speaking pass each other. I agree that your code should be testable, and if it isn’t, it’s a code smell worth investigating. This is separate from the point of should every error thrown be exercised and every getter verified, etc.. With unlimited time and for some parts of the code sure, 100% though leaves 0 wiggle room here.

A challenge with targets is gaming metrics which isn't great

My gut is that we are in this dark age of the field where we are stuck between low hanging fruit and doing this exceptionally well. I'm not sure how we achieve balance, but I do ponder it.

For example, in the domain of building a small game, then coverage is not really needed as the quality is measured more by playing the game.

In the domain of infrastructure where I have toiled for over a deacde, I have come to expect amazing coverage since so many CoE/SEV/prod issues have "did we have a test for that?". I've worked with teams getting not just 100% but all sorts of E2E and other stuff to test software.

A key problem is that what is good in one domain is awful for another, and we can't speak in general metrics as a good rule. Is 70% good? Well, maybe... Maybe not...

As I reflect on simple games, I am working on a way to automatically build unit tests for a class of games I care about: board games. As I look into how to do AI for arbitrary games (I have a random player model for giggles at the moment), I find that I could use a bit of AI to build a minimal spanning set of cases which exercise all the code paths in different combinations.

This is possible because my language tightly couples state and compute ( http://www.adama-lang.org/ ), but I believe this provides an interesting clue out of this mess. As we look into AI to write code, why don't we start with using AI to find how to automate testing and then bring a human in the loop to be like "Yo, is this the right behavior?" and then go from there.