Hacker News new | ask | show | jobs
by bluejekyll 2252 days ago
Right, the testing pyramid. Your largest set of tests should be a foundation of unit tests that run quickly. Then integration and or functional tests, which is the section this is talking about.

They are still necessary, then move up the stack and you get into the end-to-end tests of a fully running system.

The reason isn’t only speed but also signal to noise ratio. The further up the testing pyramid you go, the less clear it becomes where errors were introduced.

4 comments

However, the further down the pyramid you go:

* The more unrealistic the tests generally become (larger % of false positives - tests that fail when they shouldn't - and false negatives - unit tests that simply don't catch bugs at all).

* The less reusable the test infrastructure becomes. Stubbing/mocking individual method calls to the database is an ongoing cost of development whereas building scripts to start the database and shut it down is an investment cost that pays dividends.

On the whole I think the pyramid idea enforces a wrong view that there is a "right" mix of "test levels" across any project. The best mix is determined by the kind of bugs and code you have (integration vs logical) and the kinds of abstractions you need or already have (in general, the worse your abstractions, the higher level you need your tests to be).

A lot of projects are best done with 100% integration tests while others can be done with 100% unit (especially small, self contained, simple-to-interact-with code bases that are 99% about calculations/logical decision making).

I've started to agree with this view of the test pyramid as well. This is a great video overall, but here's a part where Aslak Hellesoy (creator of Cucumber) is talking about how a better way to think about the test pyramid is as a [spectrum of speed & reliability](https://www.youtube.com/watch?v=PE_1nh0DdbY&t=12m55s)

If a test isn't a pure unit test, where all collaborators are stubbed out, but it is still fast and reliable, it is still a very valuable test. Possibly preferable since testing multiple collaborating objects / functions provides more confidence than just testing one by itself.

With all the projects I've worked on, I've always found unit tests to be the best possible place to catch bugs and errors, because it's faster and easier to diagnose the root cause.

That said, they don't cover all test cases, which is why it's a pyramid. What I have found is that when an bug arises and is caught in an integration test, it's often beneficial to create a unit test that helps catch the same error before it you get to the integration test area, not always, but definitely if you have something that fails more than once in an area. Unit tests should never have false negatives, there's something wrong with the test if that is happening.

That being said, tests are designed around the code that is being tested. As technical debt and refactoring of existing code happens, you do often need to rework tests. Many people allow tests to go unrefactored, and they become their own set of technical debt, but that doesn't mean that they don't have value.

>With all the projects I've worked on, I've always found unit tests to be the best possible place to catch bugs and errors

Have you considered that that might be due to the nature of the projects you've worked upon rather than the nature of unit tests themselves?

>it's often beneficial to create a unit test that helps catch the same error before it you get to the integration test area, not always, but definitely if you have something that fails more than once in an area.

It depends upon what the bug was. Sometimes it's possible. Sometimes "replicating" it with a unit test is expensive and largely pointless since the unit test won't catch that class of bug in the future and will break as soon as you change the code (e.g. I've seen people try to create unit tests that mimic race conditions before and the results were horrendous to read, pointless, and didn't even catch race conditions).

>That being said, tests are designed around the code that is being tested. As technical debt and refactoring of existing code happens, you do often need to rework tests. Many people allow tests to go unrefactored, and they become their own set of technical debt

The higher level and the more behavioral the tests are, the less they have to be changed when the code is refactored and the more confidence that they give you that the code actually works afterwards.

The absolute worst situation to be in is with a bunch of unit tests that are tightly coupled to code that needs refactoring. Those unit tests' breakages signal nothing except that you've changed some code and they demand expensive repairs before breaking again in the future - again, because an API endpoint was refactored, not because a bug was introduced.

> Have you considered that that might be due to the nature of the projects you've worked upon rather than the nature of unit tests themselves?

Yes, and I have yet to see a set of code that doesn't benefit from some set of unit testing.

> I've seen people try to create unit tests that mimic race conditions before and the results were horrendous to read, pointless, and didn't even catch race conditions

I didn't say put in unit tests that are crap code that don't do what they're supposed to do. If the test doesn't catch what it is supposed to (i.e. you wrote the test and it didn't reproduce the error), then it's a worthless test.

> The higher level and the more behavioral the tests are, the less they have to be changed when the code is refactored and the more confidence that they give you that the code actually works afterwards.

> The absolute worst situation to be in is with a bunch of unit tests that are tightly coupled to code that needs refactoring.

Unit tests should be tied to the code they are testing. In general, if the API of the code changes, then the unit test will need to change, this is a no brainer.

It sounds a bit like we're talking about language level deficiencies vs. testing issues. Folks working with languages that do not have strong types, and a way to validate them, definitely makes it harder to maintain when the API changes.

For typed languages that allow you to quickly discover usages of an API, it is far easier to maintain unit tests, as they tell you immediately what has changed and what needs to be updated, before ever running the tests.

> The reason isn’t only speed but also signal to noise ratio. The further up the testing pyramid you go, the less clear it becomes where errors were introduced.

I disagree. A failing unit test doesn't even necessarily indicate that an error was introduced: if a unit doesn't do what it's supposed to but the user doesn't see that, then an error wasn't introduced. Sure, when a unit test fails there's often an error, but if an end-to-end test fails, there's always an error--E2E tests are testing from the user's perspective, so what they're testing is actually errors. (This is assuming that both the unit tests and the E2E tests are correctly written).

You're positioning unit tests as a debugging tool, but I'd argue that there are much better debugging tools: REPLs and debuggers give you a lot more information than a unit test, and allow you to ask new questions quickly.

I don't want to come across as being anti-unit tests. On the contrary, I think unit tests are highly valuable. But I don't think the value comes from gathering information, debugging, or even catching bugs (in most cases). I think the value comes from a few things:

1. TDD forces you to design units for reuse from the start. Immediately you're using the code in two contexts: the application and the unit test. So right away your code is inherently reusable (in a binary sense) because de-facto you've re-used it. Reusability is more complicated than that (it's really more of a spectrum than a binary) but having at least two uses from the beginning pushes you toward the reusable side of the spectrum.

2. Unit tests act as living documentation for units. It is often unclear by reading code what the code does, because production concerns such as performance and security can lead you to do things in seemingly complicated ways. But unit tests don't have these concerns (at least not in the same way) so you can write code in unit tests that clearly communicates what a unit does. And unit tests don't fall out of sync with code like plain text documentation does.

3. TDD is incredibly motivating. Moving red->green on a quick cycle takes advantage of the dopamine reward system to increase productivity.

Unit tests on code that has no dependencies (and I mean no dependencies, neither injected nor direct) is great.

Unit tests on code with dependencies (whether injected so that they can be mocked, or directly referenced so that they end up more like mini integration tests) are less excellent. They're brittle, inhibit refactoring, and either don't test as much as you think they do (if mocking dependencies) or are slow (if not mocking).

The further up the testing pyramid you go, the less work it is to refactor things, because you don't need to rewrite as many tests. OTOH test are more complex to write and take longer to run.

And now I get to my point: I don't think the blanket statement of "your largest set of test should be ... unit testst that run quickly" is well-founded. There are trade-offs, and they shouldn't be trivially waved aside.

Do people usually mock the database at the integration / functional level? I haven't seen that... I interpreted this to mean the unit test level, which similarly to the thread starter, seems crazy to me.
If you think of the DB like a service, then yes, people sometimes mock it. IMO, the danger comes when you start trying to mock things and act like a DB, i.e. you create some in-memory store to act like a DB. That's dangerous because then you're potentially introducing issues that are completely unrelated to the DB operations, and therefor not testing anything of value.

But it can be far cheaper to generate data through a mocked interface, than say fill a DB with data, and test against that data-set. Obviously there are ways to structure your code such that the DB isn't part of the data flow at all, but sometimes existing code structure isn't perfect.