Hacker News new | ask | show | jobs
by guygurari 2900 days ago
The article makes some fair points for performing end-to-end tests. There is however a benefit of unit tests over end-to-end tests that is not mentioned: code locality. The cost of fixing a discovered bug seems to grow exponentially with the amount of code in which the bug may appear. Unit tests typically cover a small part of the code. If a unit test fails, isolating and fixing the bug is typically cheap. On the other hand, if an end-to-end test fails (due to the same bug) then isolating and fixing the bug can be quite costly because there are many different parts of the code that need to be checked. To me, code locality is the main advantage of writing unit tests.

At the end of the day, there is a spectrum of tests going from unit tests to end-to-end tests. The spectrum represents several trade-offs such code locality vs. coverage. In my experience, the most economical approach is to write a balanced mix of tests the lie along this spectrum.

7 comments

I hear this argument a lot, but I do not think it is valid. What happens is that you simply put in the time before a bug occurs with every unit test that you write. Which isn't very efficient, seeing as not all of your units will lead to bugs. So yes, once you have all of your Units covered 100% it will be easier to find bugs, but you have invested a lot of time in order to get here and in order to keep the unit-test-suite maintained.
But the goal of tests isn't just to find bugs in newly-written code. It's to defend against regressions, and a regression is even harder to localize, given the team may not even be well-familiar with the failing code.
This is a spot where the potential for test-induced design damage comes into play.

With bite-size integration tests, I find it's generally not too hard to isolate the cause of a failing test, because the code it's testing tends to be straightforward, and fairly easy to step through, if necessary.

I frequently have a harder time with unit tests. The code ends up involving a lot of extraneous abstractions that I need to think through. The test code is often so heavily mocked that it's hard to distinguish the behavior under test from stuff that's just being mocked or stubbed to get the SUT to run cleanly, meaning I've got to start with trying to figure out whether the bug is in the test or the code being tested.

It gets worse in long-lived code bases, where the unit tests are often subject to significant bit rot on account of how brittle they are. I've definitely had some code archaeology excursions reveal that the reason an entire suite of tests were tautological is because someone was doing a nominally unrelated refactor, and just put in the minimum effort necessary to get the tests to go green again.

You can argue that developers need to be more diligent. Me, though, I figure it's sort of like those lines of bare dirt you see criscrossing the lawns of university campuses: when things get to that point, it's a sign that the official way of getting around isn't appropriate to most people's real needs.

I should say, I was complaining there of code that is pervasively unit tested, not unit tests in general.

I do think it's important to have unit tests when the unit's behavior is complicated. Where I start to get worried is when there are unit tests being written against classes that have very little behavior that doesn't involve interaction with some other module.

> I do think it's important to have unit tests when the unit's behavior is complicated. Where I start to get worried is when there are unit tests being written against classes that have very little behavior that doesn't involve interaction with some other module.

I think this is precisely where unit test suits start to have problems. Good, flexible unit testing requires a lot of judgement about what will be useful to test and what will be too much of a burden in the future. Unfortunately judgement is hard to acquire and even more difficult to teach, and a lot of teams want to create and enforce over-dogmatic testing "standards." When unit testing, you have to balance:

1. What testing do I need to have confidence my code is working?

2. What testing do I need to catch likely regressions?

3. What kinds of tests will just get in my way in the future or are literally useless [1]?

[1] E.g. unit tests that essentially only test core language functionality, once you take out all the mocks.

This is a really important aspect of good testing that doesn't seem to get as much attention as it deserves - the tests that have brought the most value for me are the ones that assert the business requirements, not the implementation details.

So for a little passthrough/orchestration class, it probably doesn't make sense to do much testing. For something that actually performs business logic, that's a prime candidate for testing. I've seen plenty of tests that just seem to aim to increase coverage, heck, I've written plenty of those myself - but at the end of the day, the benefit they serve after being written is probably minimal.

Unit test that cover complex regression scenarios are difficult to identify up front and are missed frequently.
I agree that regression scenarios are difficult to identify, this is why its good to have unit tests to begin with.

You're only unit testing the code how you 'intended' for it to work at that time. Even though the tests are written, it probably wouldn't be uncommon for a bug to slip through when running your code, what you then can do is write another test to account for that scenario, then repeat and your code becomes more robust as a result.

This is both correct and incorrect.

Correct that the goal is to prevent regressions. I claim (I don't know how to study this) 80% of your tests will never fail and so they could safely be deleted - but I have no insight into which tests will fail so I say keep them all.

Incorrect because in fact it isn't hard to localize failures: it is something in the code you just touched!

> it is something in the code you just touched!

Yes, but you also need to localize the effect of the bug to know why is it that the code you changed broke the program (and remember that we're talking about a case where the rest of the program is not familiar to you enough). Good unit tests can help you find the immediate effect of the failure, rather than the ultimate one.

I don't understand how you could live the experience that the problem is always with the just-edited code. The closest I can come is supposing that you've always worked with thoroughly unit-tested code (and correct, well-documented libraries, etc.).
I'm lucky enough that the code base I work on was a "big rewrite" not long enough with the goal of having everything tested. Also we are an embedded system where we know exactly which version of each library to support and upgrading any library is in itself a big deal done as a separate exercise.
Maybe I'm a worse than average coder, but I find it nearly impossible to write a "unit" of code without also writing one or two bugs along the way. So whether you do your testing ad hoc as you develop, or write unit tests, you do have to spend time in testing each unit of code you write. It doesn't seem to me that formalizing this into a repeatable unit test is adding a lot of real extra work.
I would suggest trying to write correct code without unit tests. They fill up mental capacity making it harder to think about then write correct code in the first place.

Keeping below 1 bug per 100 lines of code is viable simply by being careful and thinking things through. That's a long way from perfection, but it really helps.

Advice that boils down to “try harder” typically provides no value. If you want an outcome to change, you have to change an input. Better tooling or better processes may achieve that. Trying harder rarely will, because most people are already trying quite hard to do a good job.
I think people may be reading this backwards. I am specifically saying don't try as hard. Don't think about unit tests or anything else when writing correct code for a function.

Once you can generally write mostly correct code then you can work on improving your process. But, until most of your functions are working when you right them just work on improving that.

Edit: And yea quality may drop as part of this transition, but you need to get a feel for how much you can pay attention to.

You’re saying that if you have a problem with code correctness, the solution is to just try harder to write correct code.

The tests are there specifically to help find the errors that “trying harder” didn’t catch. You don’t get a higher quality result by cutting QA.

> until most of your functions are working when you right them

Heh...irony alert. You're advocating focusing on just getting it right the first time and you didn't get write right.

But seriously, the people you're responding to are correct. Any process that relies on humans being less error proneis bound to fail. You need to either create a process that makes humans less error prone (e.g. checklists) or embrace our propensity for making errors and plan for that eventuality.

I find that writing unit tests helps me think about the code so that I write better code. Plus it generally forces me to make the code more modular than I might otherwise. YMMV
If it's working for you then great. I am specifically responding to:

> I find it nearly impossible to write a "unit" of code without also writing one or two bugs along the way

Finding bugs early is great, but minimizing bug creation is even better.

THen why not think "in-code"? I mean thinking through all aspects of edge cases and the like would likely result in some written documentation- why not do it in code then?
You can only focus on a relatively small number of different things at the exact same time.

  1 3 2 5 9 2 8 7
vs.

  1 3 2 5

  9 2 8 7
Memorizing the first sequence is trivial, but dealing with each half independently is much less mental effort. Now each idea can be more complex than just a number, but even still you make fewer mistakes by removing mental overhead.

PS: Now, their are a lot of tricks on how you can get better at this stuff. But if the parent poster tends to write bugs in most functions then simplifying things may help.

I have found unit tests are useful for the original developer to very explicitly describe what test cases the code was written for (including parameter data types). The lack of unit tests for specific types of use cases can also signal to the inheriting developer what the code wasn't originally expected to do (which can signal performance limitations).
It's a trade off of when you want to spend the time.

Sure, it's less efficient to write unit tests ahead of time for every case, but in a lot of cases fixing a bug faster when it's discovered is much more important than even double the hours spent during less "pressing" times.

But I think the argument that the article is making is that it's not only the case that you're moving the time spent fixing bugs, (i.e. a greedy vs lazy strategy), the issue is also that the baseline maintenance cost of the code base is increased by orders of magnitude when unit tests are considered.

I tend to agree - in my personal experience in organizations with strict TDD culture, a perverse incentive often emerges to preserve existing flawed architecture over obviously better solutions just because it's so painful to deal with all the tests.

One of software development's most powerful properties is the ability to iterate quickly: it's foolish to prioritize dogmatic beliefs about testing over that quality.

Yeah I absolutely agree that in many cases tests become more of burden than a help, but I think that isn't a problem with testing but with it's application.

It's just an extension of your code, if you are going to throw some code away to change an interface, then throw the tests away too. If you are afraid to do that because of the time spent, then you probably spent too much time on writing tests.

While I am dogmatic about tests, I also believe that around 50% code coverage is normally enough in most application codebases. Cover the important parts, the parts that are hard to test manually, the "core" pieces that lots of other areas rely on, and some tests for bugs that you want to prevent happening again.

If you want to quickly iterate, go for it! You shouldn't have all that many tests in the parts you are changing frequently. But to change a core aspect of the codebase, or a really complicated aspect of it, then the extra work of rewriting the tests shouldn't be all that bad.

I'd shorten your last statement. It's foolish to prioritize dogmatic beliefs. Everything has a time and a place, and moderation is key.

> Units covered 100%

This metric means very little. It does not measure the extent of code path coverage and much more.

There is only one thing I have found code coverage is good for.

Code coverage can tell you what code is not tested at all.

Now, this is very useful. But:

Code coverage can't tell you what code _is tested_.

Code coverage can't tell you how _well_ code is tested.

It's also a good way to find parts of the code that are not being used and are worth deleting.
Yes that is the essence of what I was saying.
>At the end of the day, there is a spectrum of tests going from unit tests to end-to-end tests. The spectrum represents several trade-offs such code locality vs. coverage. In my experience, the most economical approach is to write a balanced mix of tests the lie along this spectrum.

IME unit tests work acceptably in one very specific scenario and fail pretty badly in all others. That scenario being:

1) You're surrounding a self contained block of code that interacts "with the outside world" via a code API.

2) That code API is a very stable and clean abstraction.

3) It has minimal interactions with modules outside of it and those interactions that it does have are tightly scoped (i.e. minimal to zero mock objects are required to write the test).

4) The logic of the code is relatively complex and most bugs that crop up are logical in nature (e.g. off by one, things getting swapped around, incorrect calculations, wrong behavior with negative numbers).

Meanwhile, integration tests (at varying levels) work well for pretty much every case apart from this and still work okay for this type of code. They make much more sense as a go-to default.

I've also worked on several projects where there was little to no code that it actually made sense to unit test. It's not uncommon that an entire codebase is predicated mainly on hooking systems together and doing some shallow calculations. IMHO, having zero unit tests in that environment is actually desirable.

The worst unit tests I've seen have been written when two or more of those preconditions have failed. They would fail constantly, require massive maintenance and, somewhat comically, almost never fail in the presence of an actual bug.

This is spot on IMO. I often see people touting the benefits of unit tests during a refactoring...but 95% of the time refactoring involves modifying class APIs since the hardest part of development is getting the object model right. Unit tests only assist refactoring when you don't modify the APIs - in other cases they are a burden.
Yes, if you change a unit's interface you will have to also change any code relying on said interface. That's a maintenance cost of unit tests.

But it doesn't follow that changing a unit's interface means unit tests suddenly become just a burden. Ideally unit tests are, well, testing a bunch of core functionality of the unit under test. You adapt them to the new interface. Then you're back to having a quick, automatic sanity check you can run against the unit whenever you have to make a change.

I don't understand people bemoaning this 'cost' of unit tests when the benefits they provide typically far outweigh the costs. It's possible broader functional/integration tests have a better ROI in certain situations, but they come with a maintenance cost as well.

>But it doesn't follow that changing a unit's interface means unit tests suddenly become just a burden.

If your refactoring is largely centered around changing unit interfaces (not uncommon) then it means that those unit tests are 100% overhead because most of the time they fail just because you changed the code.

>I don't understand people bemoaning this 'cost' of unit tests when the benefits they provide typically far outweigh the costs. It's possible broader functional/integration tests have a better ROI in certain situations, but they come with a maintenance cost as well.

I've certainly found that the ROI is a lot better. I find that integration tests have a higher up-front cost but maintenance-wise they're the same or cheaper. % of failures that are actually catching bugs is higher too.

I mostly agree, except that for parts of the code that meet these conditions, I aggressively unit test as the default. To the point that if an end-to-end test uncovers a potential bug in this component, my first action is to write a new unit test for this component inspired by the end-to-end test. If that unit test fails, then I can focus on just the unit test, and not worry about the end-to-end test until the unit test passes.

Another condition I would add to the list, which the component of mine I am thinking about meets is:

5) Failures in this component have cascading effects to multiple other components, causing seemingly "impossible" failures that are not obvious to others that they were caused by this component.

If your end to end test uncovers a bug in a lower level module that has a clean and stable abstraction, yes it makes sense to write a test that surrounds that instead.

That needn't be a unit test. It could simply be a lower level integration test.

In some cases maybe, but it's not effective in my case. In order to catch failures in this particular component, the most effective thing is to check that a large set of invariants is still true. An integration test of some kind only checks for behavior; does this component work when interacting with the rest of the system? That can show a failure exists, but is not useful for showing why. It's the unit tests which continually check the large set of invariants that can really get to the why: if a particular invariant is no longer true, the path to the problem is usually clear.

Maybe that should be a number 6): easily tested invariants.

Reading this makes me realize I had a bias against making lower level integration tests in this situation, definitely something I'll have to look for in the future.
That's exactly what the article is saying. Also, your point about unit tests assumes that tests cost the same to write. Sure, a unit test will have ten times better locality than an integration test, but you have to write ten of them to catch a bug. Also, unit tests don't catch bugs where they're very likely to crop up: the boundaries.
You have to write 10 of them if they are not part of your workflow. If you TDD (which is applicable in a lot of cases in my experience with web development), then instead of testing with Curl or a web browser, you write tests. This takes the same amount of time upfront, so the additional cost of writing the tests is actually 0.

Not sure about other types of development, but in web development, TDD can be a good way to have automated tests without the additional cost.

Yes, if you've already written the tests, the tests will take no time to write. That doesn't mean they were free the first time around.
What I mean is that compared to testing in browser, writing unit tests and using those tests instead of the browser takes about the same amount of time. So doing tests in browser or in TDD ends up taking the same amount of time upfront. So testing in browser is as expensive as writing unit tests with TDD.
The whole tests don't take much time to write is bullshit. There are always things to configure in the test framework and something to go wrong. Plus you actually have to write the tests.
What kinds of things do you configure ? For us, the test environment is the same as the development environment, only we don't send out emails. Then there are some test scripts, but those are provided by Symfony and remain unchanged. So there is very little to setup.

And indeed, tests don't take much time to write once you get used to writing them. It's like anything. The more you do it, the better you get at it.

If you're talking about emulating browser requests by tests you are already at integration tests territory.
Oh, I guess you're right. I may have interchangeably used unit-tests and integration tests because in PHP/Symfony, they look pretty much the same: some setup code and "assert()" calls.

Sorry for any confusion for anyone reading my previous comments.

You can isolate your server code behind an interface that lets you test it in your IDE, using the same requests that would normally originate in the browser. You just translate each browser request into an API call, and test that API call. No browser needed.

Cake + Eat-it, too.

From my experience trusting a unit test instead of looking at things in the browser is a certain recipe for introducing bugs. Not sure if this is what you are advocating or if I misunderstood.
That is what I do for most back-end tasks (not so much for front-end stuff yet although there is some interesting work happening in the open source world for that too). And I've actually had far fewer bugs testing with TDD than testing in browser.

In the browser, testing edge cases sometimes requires custom headers, encoding of data to create authorization tokens, etc. This means you have to either have amazing browser tools (which I've yet to find) or use a combination of browser and shell to achieve what I need.

In TDD, most setup can be automated with simple function calls. In addition, well made frameworks, such as PHP's Symfony, have utilities that even avoid making real HTTP requests, so tests run faster than using a browser, but the result is the same.

I'm not saying everyone should do TDD, but from my experience, it can lead to increased productivity and fewer bugs in some cases.

By the way, if you know of browser tools that make testing easier, I'd be glad to learn more. I use TDD because I lack the browser tools. If I had the right tools, maybe I'd consider going back to in browser testing.

Have you tried dusk? It's for laravel, but it could possibly be used on other projects, depends on your stack I guess.
Didn't know dusk. Thanks for sharing ! It does look interesting ! And Symfony seems to be doing the same thing but for Symfony, called Panther.
You still have to write the integration test to prove your units all work together.
I have been made aware further down that I might have confused unit and integration tests a bit. Sorry about that. In Symfony code, the two co-exist and look somewhat similar. Maybe that's why I don't really differentiate them.
To be honest I don't actually differentiate them either.

When my function under test calls sort() I don't fake that call out, so technically I just wrote an integration test. (In fact I work with one group that will inject a fake sort())

If I write a library foo which has sub-module bar which has class baz and fuzz, is the test for bar that tests both baz and fuzz a unit test of an integration test? Of course if you are a user of my library tests for foo are unit tests to you...

I once spent an evening reading and watching wise people on the internet to find out where is difference between unit tests and integration tests. It's really fuzzy and depends on a source.

There's no agreement what is "unit". Few classes interacting with each other could still be a unit.

I didn't find any strict definition which would be useful in practice. I just write tests and guess they are mostly integration tests, some end-to-end tests and a few unit tests.

You are right! I work on test automation for end-to-end testing. I dare say my work is very useful and has prevented a lot of bugs from hitting our prod. But at my workplace we also all agree on the fact that it is often quite painful to locate the cause of the bugs that I report, precisely because we don't have enough unit tests on our old code.

Since we live in a world of limited budget and time to spend, I agree with the conclusion of the article: "use unit tests where it makes sense". It is what we try to implement on our new code (and when refactoring legacy code).

> The cost of fixing a discovered bug seems to grow exponentially with the amount of code in which the bug may appear.

That's completely at odds with my experience.

I find that for local bugs, the cost of locating them grows with O(log n) of the amount of code. And for non-local bugs (interface bugs, system bugs, incompatible specs...) unit tests don't catch them anyway.

> The cost of fixing a discovered bug seems to grow exponentially with the amount of code in which the bug may appear.

I don't think this is true. Fixing a bug is comprised of four parts. 1. Understanding and reproducing the bug.

2. Finding the code responsible for the bug.

3. Coming up with code that fixes the bug.

4. Verifying that your fixing code does not introduce any new bugs.

#2 is the only one that could even theoretically be exponential. The only bugs I've found where I've spent days on 2, were hard to reproduce, intermittent, race condition bugs, which units tests aren't very good at finding anyway.

At the practical level, I find that most end-to-end test tells you something failed, but not which part failed.

A unit test, or a component test, would tell you that something failed, and it's in this general area, which narrows down the search quite a bit.

They're both useful, but I've seen far more problems with people arguing that end-to-end test is more than what they need, while a bad conversion from seconds to nanoseconds would be caught quickly if a unit test was actually written.