Hacker News new | ask | show | jobs
by cpeterso 3926 days ago
That's a good point that making tests fully deterministic is not actually possible. But users aren't deterministic either, so we must accept noisy test environments because that's what users see. Tracking changes in the rate of flakiness is an interesting idea.

Could you run "all" the identify flaky tests by running all the tests 100 times on the same stable build (like the latest ESR)? Is it even possible to write a test that could pass 100 times in a row? :)

1 comments

Running a test N times will certainly detect some fraction of all the flaky tests. It's something we occasionally do manually to work out if e.g. a certain intermittent is (likely) fixed and it's something that we'd like to do more to quarantine new tests.

Unfortunately there are various confounding factors that mean many intermittent tests would look clean in such a run might nevertheless be problematic. For example if you only run tests that you think are intermittent problems that are triggered by state left from a previous test won't be found. This is one reason that we've been trying to run particularly problematic test types (e.g. firefox browser-chrome tests) in smaller groups restarting the browser with a clean profile between groups to clear the state. A group size of 1 would obviously be ideal here, but when you have thousands of tests and limited resources it's not practical.

The other problem is tests that have unexpected sensitivity to the environment. For example the other day DNS was being slow on the test infrastructure. This isn't a problem for most tests since they use something like /etc/hosts. But some tests were intentionally trying to use a non-resolving domain and those tests sudden started to randomly time out.