But the goal of tests isn't just to find bugs in newly-written code. It's to defend against regressions, and a regression is even harder to localize, given the team may not even be well-familiar with the failing code.
This is a spot where the potential for test-induced design damage comes into play.
With bite-size integration tests, I find it's generally not too hard to isolate the cause of a failing test, because the code it's testing tends to be straightforward, and fairly easy to step through, if necessary.
I frequently have a harder time with unit tests. The code ends up involving a lot of extraneous abstractions that I need to think through. The test code is often so heavily mocked that it's hard to distinguish the behavior under test from stuff that's just being mocked or stubbed to get the SUT to run cleanly, meaning I've got to start with trying to figure out whether the bug is in the test or the code being tested.
It gets worse in long-lived code bases, where the unit tests are often subject to significant bit rot on account of how brittle they are. I've definitely had some code archaeology excursions reveal that the reason an entire suite of tests were tautological is because someone was doing a nominally unrelated refactor, and just put in the minimum effort necessary to get the tests to go green again.
You can argue that developers need to be more diligent. Me, though, I figure it's sort of like those lines of bare dirt you see criscrossing the lawns of university campuses: when things get to that point, it's a sign that the official way of getting around isn't appropriate to most people's real needs.
I should say, I was complaining there of code that is pervasively unit tested, not unit tests in general.
I do think it's important to have unit tests when the unit's behavior is complicated. Where I start to get worried is when there are unit tests being written against classes that have very little behavior that doesn't involve interaction with some other module.
> I do think it's important to have unit tests when the unit's behavior is complicated. Where I start to get worried is when there are unit tests being written against classes that have very little behavior that doesn't involve interaction with some other module.
I think this is precisely where unit test suits start to have problems. Good, flexible unit testing requires a lot of judgement about what will be useful to test and what will be too much of a burden in the future. Unfortunately judgement is hard to acquire and even more difficult to teach, and a lot of teams want to create and enforce over-dogmatic testing "standards." When unit testing, you have to balance:
1. What testing do I need to have confidence my code is working?
2. What testing do I need to catch likely regressions?
3. What kinds of tests will just get in my way in the future or are literally useless [1]?
[1] E.g. unit tests that essentially only test core language functionality, once you take out all the mocks.
This is a really important aspect of good testing that doesn't seem to get as much attention as it deserves - the tests that have brought the most value for me are the ones that assert the business requirements, not the implementation details.
So for a little passthrough/orchestration class, it probably doesn't make sense to do much testing. For something that actually performs business logic, that's a prime candidate for testing. I've seen plenty of tests that just seem to aim to increase coverage, heck, I've written plenty of those myself - but at the end of the day, the benefit they serve after being written is probably minimal.
I agree that regression scenarios are difficult to identify, this is why its good to have unit tests to begin with.
You're only unit testing the code how you 'intended' for it to work at that time. Even though the tests are written, it probably wouldn't be uncommon for a bug to slip through when running your code, what you then can do is write another test to account for that scenario, then repeat and your code becomes more robust as a result.
Correct that the goal is to prevent regressions. I claim (I don't know how to study this) 80% of your tests will never fail and so they could safely be deleted - but I have no insight into which tests will fail so I say keep them all.
Incorrect because in fact it isn't hard to localize failures: it is something in the code you just touched!
Yes, but you also need to localize the effect of the bug to know why is it that the code you changed broke the program (and remember that we're talking about a case where the rest of the program is not familiar to you enough). Good unit tests can help you find the immediate effect of the failure, rather than the ultimate one.
I don't understand how you could live the experience that the problem is always with the just-edited code. The closest I can come is supposing that you've always worked with thoroughly unit-tested code (and correct, well-documented libraries, etc.).
I'm lucky enough that the code base I work on was a "big rewrite" not long enough with the goal of having everything tested. Also we are an embedded system where we know exactly which version of each library to support and upgrading any library is in itself a big deal done as a separate exercise.
With bite-size integration tests, I find it's generally not too hard to isolate the cause of a failing test, because the code it's testing tends to be straightforward, and fairly easy to step through, if necessary.
I frequently have a harder time with unit tests. The code ends up involving a lot of extraneous abstractions that I need to think through. The test code is often so heavily mocked that it's hard to distinguish the behavior under test from stuff that's just being mocked or stubbed to get the SUT to run cleanly, meaning I've got to start with trying to figure out whether the bug is in the test or the code being tested.
It gets worse in long-lived code bases, where the unit tests are often subject to significant bit rot on account of how brittle they are. I've definitely had some code archaeology excursions reveal that the reason an entire suite of tests were tautological is because someone was doing a nominally unrelated refactor, and just put in the minimum effort necessary to get the tests to go green again.
You can argue that developers need to be more diligent. Me, though, I figure it's sort of like those lines of bare dirt you see criscrossing the lawns of university campuses: when things get to that point, it's a sign that the official way of getting around isn't appropriate to most people's real needs.