|
|
|
|
|
by manacit
1691 days ago
|
|
This is how we think about testing for the most part - if a test is 'flaky', it gets looked at very quickly, and if it's not urgent (e.g. the behavior is fine and it's actually a flake), it's skipped in code. Once the test is skipped, a domain expert can come back and take a look and figure out why it was flaky, and fix it. If it's urgently broken (e.g. there is real impact), we treat it like an incident and gather people with the right context to fix it quickly. As long as everyone agrees to these norms, it's not a huge burden to keep this up with thousands of tests. People generally write their tests to be more resilient when they know they're on the hook for them not being flaky, and nobody stays blocked for long when they are permitted to skip a flaky test. |
|
In another case observed, devs just got used to rerunning the entire suite (the flakiness here was about 10-20%)