Hacker News new | ask | show | jobs
by _untra_ 2008 days ago
I've found the number one cause of flakiness in tests is from misuse of nondeterministic or highly stateful functions. This becomes especially apparent when you recognize a lot of nondeterminism comes from misusing datetime libraries such as momentJS, or from Math.random . Even integration tests against certain "eventually consistent" databases tends to result in flakey tests. Entropy and Time libraries are intentionally nondeterministic, so the use of these should be recognized when writing tests, and designing functional "cores" that can do predictable operations when given testable inputs.

Ain't no one got time for flakey tests.

3 comments

Yes, definitely that. I've also seen entire piles of tests be flaky because:

- They relied on something or other over the network, or - They used Selenium to try to see whether the Dom was updated correctly, and either Selenium itself or the way the code interacted with it made _all_ the tests flaky to some degree.

The general trend I've seen is that the more "e2e" a test is, the flakier it is.

End to end tests definitely end up being flaky, especially in large systems. One level of testing would be unit tests, but e2e tests have their own place, where they do end to end sanity checks. In my experience at Rippling, we have managed to identify a lot of such flakiness by pure first principles reasoning of the behavior, and in most cases, it turned out to be a subtle bug in the code. As the org grows larger, there should be a team that just attacks flaky tests, either from a fix point of view by reviewing tests, or from tools point of view, where finding the gap becomes easier for the product teams!
e2e is the number one productivity killer at my org, by a long shot. If there is such a thing as a non-flaky e2e test, I have yet to see it. That, or the test does nothing.

Selenium seems fine for the most part. It's a solid tool. Where it falls apart is when developers do not account for all the various ways the browser session will go wrong. There are simply too many variables at play. A/B tests, cookies, popups, network conditions, machine speed (and current load). Writing e2e is like being blindfolded and tying one hand behind your back while your coworkers take turns spinning you in your chair.

It constantly amazes me that companies think they can put more tasks on a developer's plate with zero impact to productivity. They really do believe that all testing is free and will somehow pay for itself. And yet a single QA human going through a simple testing plan will catch more actual bugs than hundreds of e2e tests that cost a literal fortune to maintain.

IME the source of flakiness in tests is always something that you can deal with given enough time and about 15-20% of the time it is a bug in the code itself, sometimes quite a dangerous one.
The worst thing to deal with in regard to determining correctness of code is global, shared, mutable state. Timestamps fit that bill (even though it's the Universe changing the state).

For testing purposes I often find myself making functions that take the date as an argument. If the language supports default values, I'll set it as a default value. If it doesn't, I'll make a convenience method or a null check to set it to 'now' if none is provided.

It turns out though that a lot of code we write to run within ±30 seconds of 'now' ends up over time having to run (or re-run) on old or future dates. So with the exception of logging and events, having that as an argument turns out to be useful or at least neutral.

For logging and events I'd probably use a mock timing library anyway.

Agreed. Flaky tests are a bug and the only agreeable solution is to identify and remove the non-deterministic inputs.
But sometimes the bug is in the code not the test, and you wouldn't have known about the bug if you didn't write the flaky test! A flaky test which fails once in every N test suite runs is better than no test at all.
They say that contempt is the beginning of the end of the Rule of Law, which is why you should be careful not to pass frivolous laws.

Tests I've found are much the same way. You don't write one flaky test and stop. If you write one and everybody is okay with it, you and your coworkers write more, and more, until there are 50, 100. Once the suite flakes out on an interval, nobody takes a failed test seriously, and then broken code doesn't get checked for hours. One broken test? I bet it's the usual. I'll just rerun it a couple of times.

The thread you're pulling on here starts to unravel the whole Continuous Integration sweater.

We actually have different classes of tests to allow for some more flaky tests. You definitely don't want to run those flaky tests after every build, but you should be able to eventually get a run where those tests pass before handing it off to customers.
Interesting. Doesn’t this imply that the behavior itself is flaky?
Right - but you fix the bug. You don't add a tolerance.