Hacker News new | ask | show | jobs
by pytester 2580 days ago
What I found to be the major reasons for flaky tests:

* Non-determinism in the code - e.g. select without an order by, random number generators, hashmaps turned into lists, etc. - Fixed by turning non-deterministic code into deterministic code, testing for properties rather than outcomes or isolating and mocking the non-deterministic code.

* Lack of control over the environment - e.g. calling a third party service that goes down occasionally, use of a locally run database that gets periodically upgraded by the package manager - fixed by gradually bringing everything required to run your software under control (e.g. installing specific versions without package manager, mocking 3rd party services, intercepting syscalls that get time and replacing them with consistent values).

* Race conditions - in this case the test should really repeat the same actions so that it consistently catches the flakiness.

2 comments

Some other funny causes I've seen:

- The temperature is much hotter/colder than normal

- Someone is inadvertently holding down a button or key on the machine under test

- The wrong version of software is loaded onto the machine

> - The temperature is much hotter/colder than normal

I like this 1999 story about a flaky test at Be:

> Two test engineers were in a crunch. The floppy drive they were currently testing would work all day while they ran a variety of stress tests, but the exact same tests would run for only eight hours at night. After a few days of double-checking the hardware, the testing procedure, and the recording devices, they decided to stay the night and watch what happened. For eight hours they stared at the floppy drive and drank espresso. The long dark night slowly turned into day and the sun shone in the window. The angled sunlight triggered the write-protection mechanism, which caused a write failure. A new casing was designed and the problem was solved. Who knew?

https://www.haiku-os.org/legacy-docs/benewsletter/Issue4-22....

Those first two seem like inadequate insulation of the test suite (no pun intended).
> e.g. calling a third party service that goes down occasionally

I thought tests weren't meant to have external dependencies (or at least, ones outside the control of the test harness)?

In the past I've had the external dependencies included until it started to cause issues. Some dependencies in some projects (e.g. hard coded CDN links, time) haven't actually caused any problems.

For very complex dependencies I would build a mock that could run in a passthrough / mock mode where I could test realistically (in passthrough mode) and test deterministically (in mock mode, using a recording of the passthrough mode).

This would be helpful in getting rid of flaky tests (mock mode), ensure 3rd party services don't get hammered (mock mode) and being able to isolate and detect breakages caused by external service changes (passthrough mode).

In this context, yes, tests shouldn't require external dependencies. By "tests" we're really talking about tests like, "is this particular build consistent with its spec?"

There could be other types of test where a remote call would make sense, for example, "was the deployment successful?" tests might try to verify that the deployed version of the software can communicate with external dependencies correctly.

There are also cases that are less justified that you might have, especially once you start going down the road of "my dev environment should be a clone of production"

If you have an Employee model and it returns certain attributes of an employee like Salary, you might have tests that depend on the structure of an employee. You might have, say, Job and Position models which define an employee-job and the base definition of the particular job. Say Position has a salary range associated, and Job has validation rules which check that the salary is in range.

You could define factories for all those things, or you could use real examples that are served by a live Employee API.

The canonical way to address this is with factories and mocks, if you have time do that! (It will probably save you in the long-run, when that complexity has grown a bit.)

If you just grab the example person whose salary is out of the range for their position and quickly test that the behavior in nearby modules matches your expectations, well, those are still tests, and you could be forgiven for writing them this way.

I think they call these the "London" and "Detroit" styles of mocking, but the short version IMHO is that a mistake was making dev as a clone of production, and any errors in judgement that came after that were merely coping mechanisms.

If you want your tests to tell you when something has changed that requires your attention, you need a test that hits this Employee API and will fail if the structure of the employees returned is no longer conforming to your expectations, even though it's external. The design of such a thing is something I won't profess to know how to do well.

(It's better to version your API and write a changelog that tells what you need to know if the old version has been replaced by a new version, but if you're writing these microservices all for yourself it can seem pedantic to explicitly version your API, too. There are also coping mechanisms you'll need to embrace once you get to "we're not incrementing the API version" and surprise, many of them are the same ones...)

Each thing you remove from your tests reduces the results value by some amount.

For some programs, testing without external dependencies is basically useless. Other times, you can remove them without much loss. But it's always better if you can keep them.

In theory, yes. In practice it's sometimes inconvenient, or hard, or impossible to setup all the mocks and proxies. Especially in integration tests.