Hacker News new | ask | show | jobs
by humanrebar 2577 days ago
> Most of the time it's the test itself that is flaky

I have always understood that unit tests must inherently be deterministic for the reason you explain.

A small test that is not deterministic is testing something other than "the unit" since there is another independent variable unaccounted for, often the state of the database or the configuration of a test environment.

Not that unit tests are perfect. Unit testing a concurrent data structure without threads (which are inherently nondeterministic) is not especially useful.

2 comments

I see there as being a tension between determinism and mocking. Classic TDD dogma says to mock super close to the unit under test, so that the only logic in play is the logic within that unit. Which is all well and good, but there's lots of code out there where the stuff that breaks is the stuff on the interfaces; once you mock that out, you've removed a significant chunk of what might legitimately break, and the therefore diminished the value of the test.

So it's a balance. Sometimes it really is worth it to just attack that one function with its weird snarl of if statements and initial conditions— totally. But there are other cases where part of what you want is to inspect what happens in the adjacent object, on a different thread, as a result of stimulating something under test conditions. This isn't wrong, and these kinds of tests can be really hard to get completely deterministic, especially if the CI environment is some heavily-loaded VM host with totally different thread switching characteristics from your laptop.

I have come to conclude that excessive mocks are a symptom of poor architecture.

Classic TDD as you describe (see the other reply, classic TDD is different) works great for algorithms: take some data, manipulate it, and get different data out. There is no need for mocks. This is where you business logic should be, and it is easy to test.

However this fails in the real world because algorithms are but a minority of code: most code in my experience is just moving data around from subsystem to subsystem, and external collaborators. Here you do have collaborators and the interactions are the point. Mocks now start to make sense because the point is my subsystem deliver data to that something else, and I shouldn't know or care what that something else is.

I've seen the above fail in several ways. I've seen people mock their algorithm from the communication, but in practice the communication and the algorithm are tightly coupled anyway so changes in once will change the other.

Worse, I see many people test not the subsystem boundaries, but boundaries within the subsystem. That is they start writing the subsystem, and then realize (correctly) that they need to break the subsystem up, then they test the subsystem as it is broken down. This seems good, but it leads to brittle systems that cannot be changed because the sub-subsystem is now not allowed to change because it would break tests..

To understand this, remember, a test is an assertion that something will not change. Thus if you mock a collaborator you are asserting that the collaborator is a different subsystem and you and not allowed to refactor across this boundary. If the boundary is not an architecture boundary you shouldn't mock it because you might want to change it.

> Classic TDD dogma says to mock super close to the unit under test, so that the only logic in play is the logic within that unit.

I suppose it depends on your definition of "classic TDD dogma". Mocking really wasn't a thing until TDD had been around for about 5-10 years, so super classic TDD dogma has always been "don't mock" ;-)

"London School", GOOSE, Outside-in approach has always been to mock heavily. I call it "wish based programming". You write a test, wishing that you had some facility and since you don't have it, you mock it. Then once the test is in place, you can write your code and eventually write production code that represents the mock (and personally, I remove the mock at that point).

It was really after that, as far as I can tell, that people started to get the idea that you should mock all your collaborators in order to isolate your units. This kind of isolation was never a thing originally (see Kent Beck's original book on the subject). Even if you watch DHH's conversations with Kent Beck (and I think Martin Fowler???) on the topic and they state pretty clearly that "Chicago School" is to avoid mocking except as a last resort (my own personal preference as well). Also take a look at Michael Feather's discussion in his Legacy Code book for a good description of what the original ideas what fakes, stubs and mocks were. These days those definitions are practically lost.

I'm not sure why there has been this idea that mocking was always a part of TDD, but it definitely is a popular notion.

Mocking a module's dependencies decouples the module from the dependency modules. To me, that's the payoff of mocking. And mocks only really click for me in the wider scope of "Outside-In"-style TDD.

It's the black box/functional/integration test that exercises the production code from the standpoint of the enduser that proves whether a tested module's dependencies actually satisfy their contracts. Also, the functional tests are the only place that you can discover if DevOps is needed as well prior to deploying into a real test/prod environment. Plus, the functional test captures the user story that we're focused on in a way unit tests cannot, so the functional tests direct the overall work.

I must have those functional tests in place before I do my unit tests. Otherwise, those mocks really are creating a wish-based programming system.

I agree with you that mocking of today is indistinguishable from what Michael Feathers described in his Legacy Code book (which is excellent regardless, BTW.) Mocking today is so easy to express and change and grok with tools like Spock Framework.

Interesting, thanks for the history lesson— I feel a bit better about my own stance, which is also largely to mock as a last resort.

Although I've never been a ruby programmer, you're right that I'm influenced by DHH and the ruby community's approach on a lot of these things.

I wasn't arguing against less deterministic tests. I was just saying "unit test" isn't the name for them. Call them "small tests" or "smoke tests" or make up a new term.
Not all tests are unit tests. I had a property test I was running that I eventually just turned off because it was working just fine on everyone's machine but would fail 60% of the time on Travis due to time out issues. It got worse from 30% after Travis was sold, I suspect they are skimping on the aws. I probably should have written a more effect dependent timeout, but it was hard to justify recoding something when your test is long and your retrigger is via Travis.