| We are working on a few things. The first is automatic retries of fast tests. If a test runs quickly and fails, it costs us little to try again just to make sure. Most of our unit tests are configured to run up to 3 times. Another thing is keeping track of a database of individual test case passes/failures across all time. This will let us automatically mark tests as flaky if they fail often, and ignore their results programatically rather than requiring a human to manually mark the test as ignorable. A third thing is, obviously, automatically filing bugs against the owners/authors of tests which have been marked flaky. This is controversial -- often a test is just fine until one of its underlying libraries has a race condition introduced, and the real person to fix it should be the author of that change, not the author of the test. But it is still a step in the right direction much of the time. Many people subscribe to the philosophy that "a flaky test is worse than no test", because you think it is giving you information when in fact it is giving you none. I subscribe to a slightly different philosophy: "A test with a known flaky rate is hugely valuable". If you know how often a test flakes (statistically), then you can measure variances from that rate to detect changes. Of course, a flaky test with an unknown rate of flaky is still useless. Hence the second initiative above: measuring the rate of flake of everything. |
Firefox has an experimental "chaos mode" that takes the opposite approach. It purposely randomizes behavior by adjusting thread priorities, changing hash table iteration order, and randomize timer durations. Unfortunately, many flaky tests fail in chaos mode, so it is not enabled by default.
http://robert.ocallahan.org/2014/03/introducing-chaos-mode.h...