Improving end-to-end test reliability | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Improving end-to-end test reliability (frantic.im)
	57 points by dailymorn 1606 days ago

8 comments

vthommeret 1604 days ago

I just want to plug Playwright by Microsoft as I've been using it over the past month and have had a really great experience with it: https://playwright.dev

It's built by the founders of Puppeteer which came out of the Chrome team. Some things I like about it:

1. It's reliable and implements auto-waiting as described in the article. You can use modern async/await syntax and it ensures elements are a) attached to the DOM, visible, stable (not animating), can receive events, and are enabled: https://playwright.dev/docs/actionability

2. It's fast — It creates multiple processes and runs tests in parallel, unlike e.g. Cypress.

3. It's cross-browser — supports Chrome, Safari, and Firefox out-of-the-box.

4. The tracing tools are incredible, you can step through the entire test execution and get a live DOM that you can inspect with your browser's existing developer tools, see all console.logs, etc...

5. The developers and community are incredibly responsive. This is one of the biggest ones — issues are quickly responded to and addressed often by the founders, pull requests are welcomed and Slack is highly active and respectful.

My prior experience with end-to-end tests was that they were highly buggy and unreliable and so Playwright was a welcome surprise and inspired me to fully test all the variations of our checkout flow.

wereHamster 1604 days ago

Have you used Cypress before? If yes I'd be interested in a comparison from your perspective.

vthommeret 1604 days ago

I did, but only very briefly. I originally wasn't looking for an E2E tool but was evaluating another tool for a different problem (Nx) which included Cypress as part of its opinionated defaults.

Cypress was a surprisingly nice experience as well and led me to research other modern e2e tools. Most of the points above can be compared against Cypress — Playwright supports parallel execution of tests within the same file on the same machine, which Cypress doesn't, and so is much faster. Cypress doesn't use modern async / await syntax. Due to its architecture, Playwright can test across tabs, work with iframes easily, which Cypress can't.

The UI for Cypress's developer tools is nice, but... as I said, Playwright's tracing UI is really excellent and the documentation is also really well done. This is also a personal thing, but I trust tools that came out of browser teams (Chrome) to emulate browsers in a more efficient way, e.g. spinning up cheap, isolated browser contexts in Chrome, the details of waiting for an element to be ready, etc...

Another post on this: https://alisterbscott.com/2021/10/27/five-reasons-why-playwr...

avensec 1604 days ago

I appreciate the article for visibility as we could always use more knowledge and partners in the Quality Engineering space. Just recognize that these are relatively low-hanging fruit/early maturity concepts in test engineering.

Are you are discovering these for the first time? Great, happy that you are getting exposed! If you read these and think, "we could utilize these concepts with our engineers(test or not)," I would encourage you to look at it from an organizational perspective. You may want to add someone to your team(s) with these skillsets. Most automation testers understand these concepts well and can help you on the next-level maturity items.

goodusername 1604 days ago

A really hard problem that often arises when doing E2E tests, is creating and managing test data.

If you have one or more integrations to external systems, where you cannot control your test data, it becomes much harder to write stable E2E tests.

Some don't have test environments, some have too few. Most don't allow you to setup data easily either way.

You can, of course, mock the external systems, but if they play a large enough part, your tests start looking more like integration tests again, but with the added overhead of something like browser automation.

It's a hard balance to strike.

8organicbits 1604 days ago

> When an E2E test is failing consistently and nobody cares to fix it, that means the test isn’t useful. There’s no point in having it around.

I suspect this is a good idea, but it raises some red flags for me. People may not want to fix tests if they don't feel like they have time, or fixing tests will help their promotion (i.e. culture). Of course if you have good engineering culture, this is probably a useful signal for tests to remove.

domesticsimian 1604 days ago

I think maybe the point of that quote is that failing tests are just noise. Either fix them or remove them. If I try to run all tests for some pull request and 20% fail, what does that tell me? In the case where we regularly have some number of continuously failing tests, it doesn't tell me much. Did my PR make it worse? Did my PR make it better? Having continuously failing tests definitely doesn't add value and explicitly makes things harder to reason about when looking at test results.

avensec 1604 days ago

One of the reasons we test is for confidence. If we can't trust a test, it isn't providing value. It may give negative value due to the time required to inspect the failure or general erosion of trust in the test suite.

One pattern that we can apply to increase visibility or ownership is stability metrics. If a test must/should be fixed many times can be teased out once you can view these metrics. On failure, display that this test has passed in this configuration for the past x-amount of runs. - Pass the last 100 runs? High likelihood the test is highlighting a bug and must be engaged on. - 95% pass rate in the last 100 runs? It may be time to quarantine this test and add it to the remediation backlog. Your level of acceptable false-positive rates may differ depending on team velocity and suite runtimes.

"How many tests are in quarantine, what is the average time-to-fix, and what direction is this trending" are valuable metrics that we can utilize to find ownership and highlight the technical debt.

As you said, culture around such patterns isn't always there.

mleonhard 1604 days ago

I wish there were tools for small teams to achieve this level of sophistication. It seems like only massive corporations can do testing really well, because they can afford to assign multiple engineers to build and maintain their bespoke test systems.

I'm a solopreneur building an app with Flutter. Flutter's testing support is mostly broken and or unwritten. It's very frustrating.

0xbadcafebee 1604 days ago

This article is spot on. For $LargeNetworkHardwareVendor we maintained three different automation test frameworks for end-to-end testing. Our tests were more abstract functions that were given arguments for a particular test case. Those were then made into collections of tests that could be re-used. A configuration file allowed QE to build new test cases without programming knowledge. QE would write configs and occasionally one or two of them that could code would modify the test framework. All the tests ran in a scheduler from clusters of test-running manager-servers against globally distributed labs of hardware. While teams did have unit tests and functional tests, the end-to-end test was king (and necessary given the multiple levels of interface for that gear)

A lot of reliability in that system came from being able to quickly iterate on different levels of the system. The easier it was to solve a failure where it's happening, the more likely your bugs can be fixed quickly, so you have a healthy system (as opposed to suffering from entropy and tech debt)

TotempaaltJ 1604 days ago

I love learnings on automated testing. From the perspective of someone who isn't used to TDD or even just building many tests, maintaining E2E tests often seems extremely cumbersome. I wonder if I'm just missing out on the best practices, or if the tooling simply hasn't evolved enough yet.

martinald 1604 days ago

The payoff is much higher imo though. Of all the tests we do, e2e catches by far the most problems. Indeed the biggest mistakes I've made often are me thinking tests are flakey 'because e2e' when in reality they are showing a glaring problem.

Especially in mobile/web applications where you are often consuming loads of services/libraries/sdks, some in house, some external, you are often running a tiny amount of your own code. Adding tonnes of unit tests to that is sort of missing the big picture - you need to test it all works together as a user would.

sidlls 1604 days ago

That's a rare attitude to have in the Bay Area anyway. Unfortunately. Everyone wants lots of unit tests because they run fast and give (roughly) instant feedback. Unit tests paint an incomplete picture. Too little attention given to integration tests and end-to-end tests leaves systems exposed to critically bad edge case bugs.

mgkimsal 1604 days ago

> Too little attention given to integration tests and end-to-end tests leaves systems exposed to critically bad edge case bugs.

From my POV and experience, the middle ground is often what people refer to as 'integration' tests. Testing (without a browser), hitting endpoints/urls with known payloads and getting expected results catches errors with assumptions made about the interaction between various individual libraries.

At least in the web app world, my views are:

1. Testing the individual libraries gets you one layer of confidence. 2. Testing the interaction of those, usually via URL endpoints as various identities, gets you another layer of confidence. 3. Testing with E2E exposes primarily UI/JS problems.

When the first 2 are strong/solid, you can focus troubleshooting problems in #3 at the client/JS level first. It's not always the case, but it can help reduce concerns about "is this a back-end issue?".

I've been (slowly) trying to write more js component tests (in one case, with jest and vue), as it makes it easier/faster to test many permutations of input/validation/etc all at once. It's yet another 'confidence' area such that, when there are E2E tests, I can narrow down focus even more.

On a couple projects I've been on the past few years, we've found very few problems via E2E tests alone, mostly because there are so many back-end unit and integration tests. The E2E issues that are found are often UI-only (error state changes not rendering, sometimes perf issues, etc).

bluGill 1604 days ago

The problem I have with unit tests is they inhibit refactoring off the API they test.

Sure if you write your own implementation of "string" or "list" you will probably get the API right the first time - those are commonly used and time tested so everyone knows about what the API should be. However almost nobody is writing them, they come with your language for everyone but a few language implementers, or once in a while the company library implementers.

For everyone else we are writing to a business requirement that isn't well understood and may change. However the purpose of a unit test is to assert that no matter what this won't change. So every time you want to make a change all those tests are in the way of the change and need to be fixed.

Everyone writing tests needs to figure out their own middle ground. Because end to end tests have their own problems.

Jtsummers 1604 days ago

That stretches the idea of a refactor, though. If a refactor changes the external behavior it's not really a refactor, which is a structural change. Unit tests when applied at an API level (which may be a very small unit or up to the level of a library, but that also stretches the definition of "unit test" depending on the size of the library) are there to ensure that changes to the internals don't impact the behavior.

As soon as you start changing the behavior, you have to change the unit tests. If you're adding behavior, you have to add tests. If you're removing behavior, you remove tests. If you're changing the way a procedure works, you change the related unit tests.

Really, any behavior change requires changes to the tests (whatever level they may be, if you want a high degree of test coverage).

mgkimsal 1604 days ago

And without relatively comprehensive tests, you can't ever tell if a 'refactor' (in the structural sense of the word) actually worked. Did you change internal implementation code without affecting the consumer output? Without tests, you can't reliably tell with a high degree of confidence.

bluGill 1604 days ago

Units by definition are not external behavior. sometimes they change external behavior, but there are a lot of changes that make code cleaner without changing external behavior. All too often I've discovered after a few years that I really need to split some unit into two.

eatonphil 1604 days ago

Yep same. For libraries unit tests are great. For applications though I feel the most value writing integration tests and e2e tests. That's what helps capture the biggest user-facing bugs.

Afton 1604 days ago

It's about tradeoffs. On one end you have precision, speed, reliability, diagnosability. At the other end you have "realness".

Unit tests fall on the far left, workload tests/E2E tests/testing-in-production fall on the far right.

It turns out that there's no 'wrong' level, there's just different tradeoffs. I've worked at a lot of companies that embraced the realness of E2E tests, but then suffered from the maintenance/performance/diagnosability/instability of those tests. I have colleagues who worked at places that avoided E2E at all costs, and suffered because they would have a green test run, but user scenarios that a simple E2E test would have caught, were completely broken.

IMO there is a lot that can be done to improve E2E testing at most companies, but they definitely have the capacity to add value to your release/testing pipeline.

spuz 1604 days ago

The title should mention this article is from 2019. I wonder if Facebook testing practices have changed since then.