| But we (Mozilla) are also doing many of the same things as the Chromium team here. In particular there is work in progress to automatically "ignore" the results of known-flaky tests until we detect that there has been a change in the rate of flakiness, at which point we will — assuming all goes to plan — trigger new test runs until we can determine the point at which the regression was introduced. I think one of the lessons we've learnt is that with a browser-type project it's very hard to make test runs fully deterministic, for both technical and human reasons. The technical reasons are touched on in the original article: these are complex codebases with lots of moving parts and lots of environmental dependencies. Of course there are various tactics to try and combat this; for example there is a wiki page dedicated to innocuous-looking code that leads to intermittent tests [1]. The human reasons centre around the difficulty of getting people to care about spending time fixing a test that fails one time in 1,000 (which is still very noticeable when you are running it hundreds of times a day). Unless the issue is something that fits a known pattern it's hard work, difficult to tell if your fix even worked, and not likely to be considered a top priority due to the diffuse, hard to quantify, nature of the benefits. I think the fact that both Google and Mozilla still have significant problems with intermittents despite talented engineering staff and it having been a known problem for years implies that some of the standard thinking about making tests fully deterministic simply doesn't apply; for this kind of work you have to embrace — or at least accept — the randomness, and look for ways to get the data you need despite the noise. [1] https://developer.mozilla.org/en-US/docs/Mozilla/QA/Avoiding... |
Could you run "all" the identify flaky tests by running all the tests 100 times on the same stable build (like the latest ESR)? Is it even possible to write a test that could pass 100 times in a row? :)