|
|
|
|
|
by IanWhalen
3925 days ago
|
|
I'm curious whether you've got any interesting projects underway to work on the flaky test issue? I've been looking at FBs test lifecycle approach with some interest (http://goo.gl/QtUNxY), but would love to hear about any other approaches to solving the same problem. And FWIW on building MongoDB with Evergreen (https://evergreen.mongodb.com) we've found that a stepback approach seems to give us the finest granularity with minimal cost in identifying the introduction of an error - we batch commits for execution, but when a specific test fails we start running just that test on each previous commit until we hit a passing execution. It obviously doesn't work perfectly in the face of flaky tests (see above question) but it seems to do pretty well. |
|
The first is automatic retries of fast tests. If a test runs quickly and fails, it costs us little to try again just to make sure. Most of our unit tests are configured to run up to 3 times.
Another thing is keeping track of a database of individual test case passes/failures across all time. This will let us automatically mark tests as flaky if they fail often, and ignore their results programatically rather than requiring a human to manually mark the test as ignorable.
A third thing is, obviously, automatically filing bugs against the owners/authors of tests which have been marked flaky. This is controversial -- often a test is just fine until one of its underlying libraries has a race condition introduced, and the real person to fix it should be the author of that change, not the author of the test. But it is still a step in the right direction much of the time.
Many people subscribe to the philosophy that "a flaky test is worse than no test", because you think it is giving you information when in fact it is giving you none. I subscribe to a slightly different philosophy: "A test with a known flaky rate is hugely valuable". If you know how often a test flakes (statistically), then you can measure variances from that rate to detect changes. Of course, a flaky test with an unknown rate of flaky is still useless. Hence the second initiative above: measuring the rate of flake of everything.