| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by charleslmunger 494 days ago

>Anecdotally I've always found that tests which cover real life bugs are in the class of test with the highest chance of catching future regressions. So even if it does exist, I'm still mildly skeptical of the idea that tests that catch bugs that compilers have been provoked into also catching are strictly negative.

Maybe this is a static typing thing - if the test won't build because you've made the bug inexpressible in the type system, what test can you even write?

>If I write even a single line of production code I have specified what that line of code is going to do.

>combination of code investigation, spiking and dialog with stakeholders

Does "spiking" in this context mean writing code without TDD?

>I think this is conflating "TDD is a panacea" with "if it is valuable to write a test, it's always better to write it before".

That's a weaker formulation of TDD than I've seen espoused and practiced, which is usually something more like "before you make any behavioral changes to the code under test, you must write a test that fails; then make your edit so the test passes, and repeat". The problem with your approach at least for me is that until I've messed around a bit in the code seeing what's the best approach is to solving a problem, I don't know what the best structure for a final test is, or whether I can make the change in a way that leverages existing tests to detect the bug.

> I'm not really sure why fault injection should be treated as special.

Suppose you write a test reproducing a bug that reacts to allocation failing. One common way to do that is to inject an allocator that fails on the specific allocation call you had a bug with. But how do you target that specific call? One way is to make the Nth allocation in a test fail - but even slight changes to the production code will make this test start testing something completely different. The solution is to have a fuzz test that injects failed allocations randomly per run - now you can be reasonably confident that even if the prod code changes over time, your fuzz test will still all the allocation sites, preventing regression. I don't see why it's beneficial to write this first or last. Under your model the instrumented allocator setup is a replacement for the single test case that reproduces the original bug.

>If I'm reading it correctly, this looks like a class of bugs we discussed that is fixed by tightening up the typing. In which case, no test is strictly necessary, although I'd argue that it probably would not hurt either.

No, the original "bug" is that if you have two calls to SpaceAllocated and the second one races with a Fuse call on a other thread, you can see a smaller value result than before. This behavior wasn't guaranteed by the public API (so not a bug) but it's desirable to add. The fix is replacing a singly linked list with a doubly linked list, using tagged pointers to avoid adding storage cost for the second set of links. A test could be written for this; I could inspect the allocated addresses of three arenas and fuse them in a specific order, but I could not actually reproduce the ordering required without a test harness that allows full control of thread execution interleaving - which would be a huge amount of work and in my opinion not actually prevent future bugs, since that interleaving only meaningfully exists in the previous implementation. The full set of possible interleavings is way too large to productively explore. If I had started by writing the test, I would have spent a bunch of time messing around before giving up; because I did the implementation first, I had a much more informed idea of what would be required to test it, and changed my plan for preventing future bugs to documentation.

>I can't tell if this is refactoring or you're fixing a bug. Is there a scenario which would reproduce a bug?

In this case the "bug" is not really a bug - if we are allocating an arena, we put the overhead at the start rather than end of the first memory block. The reason we're doing this is that while it's totally valid to use the provided memory however we want, we can avoid faulting an extra page if we store the overhead near where we're about to allocate from. So the test would have to test "on a virtual memory platform, do we write to memory in a way that maximizes spatial locality". This is possible to do but it's a total mess of a test, and I have no idea how to do it on Windows. More importantly, anyone who changes this code is going to be doing it on purpose, and we don't guarantee to callers where we'll be placing any pointers we return.

>Again, I see no examples where writing a test after would have been better.

If TDD produces better results for you, that's great. I think I made a case with real examples where using TDD would have been at best neutral, but in practice would have made me spend more time to get the same results.