Hacker News new | ask | show | jobs
by tossaway9000 1689 days ago
How about you fix the flaky tests? Am I insane for thinking that? The whole concept of "just reboot it" or "re run it again" and "fixing" the problem is at least one reason the modern world sits on a mountain of complete garbage software.
3 comments

This is how we think about testing for the most part - if a test is 'flaky', it gets looked at very quickly, and if it's not urgent (e.g. the behavior is fine and it's actually a flake), it's skipped in code.

Once the test is skipped, a domain expert can come back and take a look and figure out why it was flaky, and fix it.

If it's urgently broken (e.g. there is real impact), we treat it like an incident and gather people with the right context to fix it quickly.

As long as everyone agrees to these norms, it's not a huge burden to keep this up with thousands of tests. People generally write their tests to be more resilient when they know they're on the hook for them not being flaky, and nobody stays blocked for long when they are permitted to skip a flaky test.

Curious, how often do you see a flaky test in your system? In my past experience at one of the mid-size startups, we used to get a new flaky test almost on a weekly basis in a monorepo. We started the process of actually flagging them as ignored (we created a separate tag for flaky tests), but later realized that the backlog of fixing flaky test never came down.

In another case observed, devs just got used to rerunning the entire suite (the flakiness here was about 10-20%)

Haha great point. Well from what we have learned from our users is "fixing" test typically end up with "delete most of them". Fixing tests can be time consuming effort.

Another way to think about it is, whether Flaky tests are worth keeping? At some point if the tests fail often, do these really add value. And we think - it does. If you are able to identify flakiness from real failure and reduce noise, you can still avoid real failures.

Wow. That works like really poor technical leadership. Fixing flaky tests (as opposed to deleting them) is indeed time consuming, but it is a far cheaper choice than getting to the point your test suite is untrustworthy.

There may be a point where the cost of ownership for a specific test exceeds its utility, but the way to resolve that is usually to reevaluate your code and supporting tests. Suppressing flaky tests seems a very unwise choice.

Perhaps under extreme circumstances and with unhealthy code bases there may be a case for this, but I struggle to imagine it.

That is a fair argument. Not all organizations have the bandwidth to measure and manage stability of builds. Some companies build internal tools / dev productivity team for this purpose. There are always right intentions to comment out the flaky test with the mindset of coming back to it, but it is also a very low priority item in most cases when you have to ship new features.

Fixing flaky tests can very commonly take longer than writing new tests.

Let me give you the example of a test that hadn't ever failed on a dev machine or on staging or prod, just on the flaky CI infrastructure.

Yes, I'm mostly agreeing with you that the tests should be fixed, but I have seen ones that were perfectly fine (given the constraints) and what should have been fixed was the CI.

Yes, but even in that scenario, it should be consistent, not flaky. If it always works in some environments and always fails in others, that is not ideal, but at least can be accepted. But if it sometimes works and sometimes fails, it should be investigated.