Hacker News new | ask | show | jobs
by jeremyloy_wt 849 days ago
Counterpoint when using mocks - if Bs behavior changes, one may not remember to update A’s test which would be falsely passing.

This problem is exacerbated if B is a popular object used by many components.

IMO if you own A and B, never use mock. Possibly write a Fake B if B is non deterministic or slow. Then write a parity test for B and Fake B

1 comments

> Counterpoint when using mocks - if Bs behavior changes, one may not remember to update A’s test which would be falsely passing.

> This problem is exacerbated if B is a popular object used by many components.

This is a great addition to the caveats I mentioned.

> IMO if you own A and B, never use mock.

I'm going to take issue with your use of the word "never" here.

If you said something like, "If you own A and B, don't use mocks the vast majority of the time", I'd even agree--but there are important exceptions (and I'd argue that non-deterministic B is one such exception).

> Possibly write a Fake B if B is non deterministic or slow. Then write a parity test for B and Fake B

Would you mind explaining what you mean here?

> I’m going to take issue with to you’re use of the word never here

Perfectly fair. I shouldn’t have stated it as an absolute. If I could still edit it, I would change my statement to “don’t default to a mock”

> Would you mind explaining what you mean here

Gladly. Martin Fowler provides really clear definitions for the different types of test doubles.

https://martinfowler.com/articles/mocksArentStubs.html

Maintaining that your Fakes are correct takes work. An easy way that I’ve found to do that is to run the tests against a “real” component and the Fake component with the exact same set of assertions and set up. If that test breaks, then you know that consuming code should also break

Okay, I think I understand. In Fowler's terminology it appears I've been using stubs and mocks and calling them both mocks (shrug).

I haven't used fakes so take these critiques with a grain of salt, but I have two concerns with fakes:

1. The process you're describing to verify fakes does take work, and sounds suspiciously like it veers into testing your tests. That's probably worth it for applications like the space shuttle or an X-ray machine that need an extreme degree of reliability, but seems pretty overkill for the kinds of applications that make up most of the work in the software industry.

2. More often than not, I can't think of a useful fake for most things I'd use mocks for. The in-memory database mentioned by Fowler is an exception: you're skipping writing to disk, which saves a lot of time. But in most cases, the fastest implementation of a unit is... the unit, because if it wasn't the fastest you'd use the fake instead of the unit.

And all this sort of sets aside the larger issue which is that there are plenty of cases where the unit you want to test isn't the interface between A and B, so having a real B in that test just adds unwanted dependencies. The ideal here is that if you break the interface between A and B, one test fails--the test of that feature of the interface between A and B--which tells you exactly where the bug is. With an LLM, for example, this could look like you calling the real LLM and just testing that the call doesn't call any errors and returns syntactically valid output (even if the output is unpredictable, you can verify the syntax). If you're using fakes everywhere, then a failure of the interface between A and B is going to cause failures in a bunch of A tests, which makes debugging harder, not easier.

The case of the in-memory database makes sense because a general purpose fake (the in-memory database) can be written by a third-party and act as an effective mock for all the places you're using the DB--it's easier to use the in-memory database than to mock all the places where you call the DB (and notably, I've never seen anyone write a bunch of tests to test the in-memory database as you describe being a good idea for fakes). And in most cases, I'm using mocks because it's not easier to use the real thing.

The last point of that comment is the core to "never use mocks": implementing a fake that behaves exactly as the real implementation means that you can run a test that passes both for the fake B and B.

Eg a simple example is having FakeSentimentAnalysis that will behave exactly like SentimentAnalysis for the classes it can return. You can then have an identical (parametrized) test that works on both, and from there on, you can trust that this test will break when fake diverges from the real implementation, without having to worry about mocks littered through code.

I like to push that even further and implement fakes inside the code/library/service that implements the real thing too: this ensures interfaces are really identical, and you only need to structure the code using the same approach.

As this is all code under your control, that is an ideal which is not hard to achieve if the entire team accepts that mindset.

But more often than not, particularly when interfacing with remote services, you don't want 'FakeSentimentAnalysis' to behave exactly like SentimentAnalysis. You want it to do crazy and unexpected things exactly unlike SentimentAnalysis to ensure that you properly handle the failure conditions that shouldn't, but theoretically could, occur when using SentimentAnalysis. You almost don't even need to worry about the cases where SentimentAnalysis is working as expected. It's the failure states that are of top concern.
As we are talking about "code under your control", I don't see a conflict there: you seem to be assuming that I was suggesting the Fake should only implement the happy path, but on the contrary, it's quite easy to have the fake exhibit failing behaviour, but it might be harder to have the same test work against both the fake and real implementation in that case (eg simulating a network connection issue in a fake is easier than with a real implementation).

I generally do that by having a test that works both against a fake and against the actual implementation, a bunch of others that only use fakes, and a few system/e2e tests covering the whole thing.

With not much effort, you get increased trust in the code you write.

But most notably, it makes you write testable code which I think is most maintainable and most readable code to write.

> you seem to be assuming that I was suggesting the Fake should only implement the happy path

There is no such assumption. The assumption is that if you rely on a fake to service all your testing needs it cannot be tested against the real implementation as, in many cases, you do not want it to work the same way.

I don't know if a network connection failure is the greatest example, but let's run with it. Why bother adding simulated network failure into your fake, which, due to the problems you point out, won't be touched by your double-duty test suite anyway, when you can just create an additional mock that does nothing but simulate network failure? Why add needless complexity to the fake? You haven't gained a testability advantage.

That's not to say that a fake isn't also useful, but you haven't made a case for why it has to be the be all and end all. You can use both.

Let's imagine you hard-code a network failure in your fake for a particular domain, say "this-domain-fails.com", but it otherwise "works" for all the other domains. While your double-duty test can't confirm that your real implementation handles the failure properly, it will confirm that your fake otherwise works quite similar to the real implementation for other domains. And you'd test the failure condition with a separate test with the fake set up in exactly the same way as in the double duty test (eg. with a fixture).

And sure, this does not gain any testability advantage compared to a mock, but if your test for the failure uses as much as possible of the same code paths as the real implementation, only substituting the fake in, you increase trustworthiness — if the APIs between a fake and a real implementation diverge (a common problem with mocks as tests continue to pass), it's likely to be caught by the double-duty test, and as you adapt your fake to match the new reality, you'll likely start getting the network-failure test to fail too.

In the above example, the only bits you can't fully trust, since it's not automatically tested for both implementations, is your "emulation" of the failure: you want to be careful about how you implement that so it really happens in comparable circumstances to the real implementation (eg. it's ok to throw an error where you would otherwise be calling out over network and returning data).

A lot of it depends on how you structure the code. In memory database fakes are the easiest examples, because it's clear to most people how you can structure the code to have a facade API that's used everywhere, and only have the final fake/real implementations that either do stuff in-memory or on the actual database. But you can generally do that with anything.

In general, testing is never equivalent to proving code works correctly, but I think this is the closest you can get (with a healthy dose of fuzzing on top).

However I found most software engineers not to believe it to be possible or doable with not much effort to achieve this level of trust in the code. But "showing the code" has managed to convince most — it does require a switch in the mindset, but it's quite similar to accepting that real TDD is possible for anything but toy problems (I don't think it's the most efficient way to develop, but I think it is possible and teaches people to write testable code).