Hacker News new | ask | show | jobs
by mceachen 2576 days ago
Every company I've founded or worked for has struggled with flaky tests.

Twitter had a comprehensive browser and system test suite that took about an hour to run (and they had a large CI worker cluster). Flaky tests could and did scuttle deploys. It was a never-ending struggle to keep CI green, but most engineers saw de-flaking (not just deleting the test) as a critical task.

PhotoStructure has an 8-job GitLab CI pipeline that runs on macOS, Windows, and Linux. Keeping the ~3,000 (and growing) tests passing reliably has proven to be a non-trivial task, and researching why a given task is flaky on one OS versus another has almost invariably led to discovery and hardening of edge and corner conditions.

It seems that TFA only touched on set ordering, incomplete db resets and time issues. There are many other spectres to fight as soon as you deal with multi-process systems on multiple OSes, including file system case sensitivity, incomplete file system resets, fork behavior and child process management, and network and stream management.

There are several aspects I added to stabilize CI, including robust shutdown and child process management systems. I can't say I would have prioritized those things if I didn't have tests, but now that I have it, I'm glad they're there.

2 comments

In my experience complex end-to-end tests cast a wide net that often results in finding a lot of issues and they provide enormous value. Their main negative is maintenance around robustness as the article discusses and hardening tests can take a lot of investment. That said, the alternative is worse (not having them) so I find your approach, and the author’s is what I’ve often done. I think there needs to be understanding (across the team and management) that automated tests are software and it will require a similar dev effort to maintaining any other software, especially so because there usually aren’t tests to test the tests!

I’m founder at Tesults (https://www.tesults.com) where we have a flaky test indicator that makes identifying these tests easier. It’s free to try and if you can’t get budget for a proper plan send me an email and I’ll do what I can.

In general the only way to never have flaky tests is to have simpler tests but I find those often don’t provide as much value - that’s just my personal belief after having spent years focused on automated tests, e2e tests do have robustness issues but the bugs they find make them totally worth it. Out of the issues mentioned in the article that affected my tests the most, it’s timing. They can be overcome though, I’ve run test suites with a couple of thousand e2e tests (browser) that have been highly robust and reliable after time was devoted to hardening them. You do have to focus on that and refuse to add new test cases until the existing ones are sorted out in some cases.

Sorry for the OT, what is "TFA"?
Sorry. The Fine Article. I didn't mean it in the disparaging connotation.

It's a reference to RTFM, Read The Fine Manual.

TIL: RTFM was a phrase from the 40s : "Read the field manual."

I never seen the F in RTFM mean “fine” before. I’ve always seen it used as the more vulgar “read the f*ing manual”.
I believe that's the joke.
It's the same as OP, except it only means the Post, not the Poster. (The F* Article.)

Usually it's a kind of negative retort - 'well if you'd actually bothered to read TFA then ...' - but increasingly it seems to be used without such emotion (particularly, to me anyway, on HN) to mean simply 'the submission'.

I always read it as "the featured article".