Hacker News new | ask | show | jobs
by justinpombrio 2576 days ago
> If a test has a failure rate of < 3%, it is likely not worth your time fixing it.

How do you know? What you say is plausible, but it's also plausible that these rarely-failing tests also rarely-fail in production, and occasionally break things badly and cause outages or make customers think of your software as flaky.

Since you say this, I presume you've spent the time to actually track down the root causes of several tests that fail < 3% of the time? If so, what did you find? Some sort of issues with the test framework, or issues with your own code that you're confident would only ever be exposed by testing, or something else? I'm very curious.

2 comments

It's possible, but after fixing lots of these, my experience says usually talking about stuff like clicking a button before a modal animates out of the way.

It's sort if a "bug" in that yes, clicking here and then here 1ms later doesn't do do the best thing, but it's basically irrelevant.

Testing is inherently a probabilistic endeavor.

"What can I do that is most likely to prevent the largest amount of bugginess?"

Fixing tests that rarely fail is -- in my experience -- a poor answer to such a question.

> Testing is inherently a probabilistic endeavor.

That's a pretty powerful insight!

I think that a lot of developers who are firmly in the test-driven camp don't realize this, but instead think that if you have 100% test coverage, your code will work 100% of the time. Fixing bugs, to them, is "just" an inevitable result of increasing your test coverage, so that's what they focus on.

My point here is that even if it may be because of flaky code, general unit and integration tests are the wrong tools to test for flaky code. The only exception I have encountered here is if you have code that is written to specifically handle concurrent situations, and your test is focussing specifically on testing the concurrency part.

The most common places these flaky tests occur are with integration/browser-based tests, where there are multiple layers of tools that each fail a small percentage of the time.

Unit tests also sometimes fail because of not cleaning up state properly, which only breaks things when tests run in a very specific order. Or sometimes subtle assumptions in the tests about database ordering that is only valid 99% of the time.