Hacker News new | ask | show | jobs
by piokoch 2576 days ago
"Non-deterministic tests have two problems, firstly they are useless, secondly they are a virulent infection that can completely ruin your entire test suite."

"To this I would like to add that flaky tests are an incredible cost to businesses."

I think that the misconception here is that "tests should not fail", because they are "cost", "has to be analyzed and fixed", etc.

An integration or functional test that is guaranteed to never fail is kind of useless for me. Good test with a lot of assertions will fail occasionally since things are happening - unexpected data are provided, someone manually played with the database, ntp service was accidentally stopped and date in not accurate and filtering by date might be failing, someone plugged in some additional system that alters/locks data.

In case of unit tests, well, if everything is mocked and isolated then yes, such test probably should never fail, but unit tests are mostly useful only if there is some complicated logic involved.

2 comments

> An integration or functional test that is guaranteed to never fail is kind of useless for me.

I think that's an important distinction between functional and integration tests. Generally, a functional test is supposed to exercise a particular set of APIs or code paths - across components in a semi-realistic arrangement, so unlike a unit test where all but one would be mocked, but still pretty focused. It's OK for such a test to ignore concerns outside of its own scope. Data validation/sanitization should have its own tests, for example, and not be a part of every other functional test. That's just duplication of effort for very little benefit.

By contrast, it's reasonable for an integration test to fail due to something external like NTP failure ... once. After that, there should be a separate functional/regression test to ensure that the dependency is properly isolated, and integration tests should be expected to pass consistently unless there's a new kind of fault. That allows integration tests to capture all of those dependencies over time, until the full set approximates the set that exists in production.

Don't worry too much about the precise dividing line between functional and integration tests, though. The important thing is that they're not synonyms. Whatever one calls them, there are different classes of tests with different purposes. Statements like "tests should never fail" or "tests that fail are better" are too general to be useful across all kinds of tests.

You clearly have not worked on a codebase with thousands of tests. At my previous job the build system had an option to run a test N times concurrently in the cloud. I used this whenever I wanted to commit to some other project but some of their tests were garbage (to prove that test is flaky, and therefore to be ignored). You could even binary search (running 1000 times on each pivot point) to see who introduced the flakiness. Expensive but gets the job done.

In my projects I either fix the nondeterminism or delete such tests.

Pseudorandom deterministic tests have their value, presuming you store faulty input and/or seed.

These are not exactly nondeterministic but sometimes people end up with that instead of pseudorandom ones.