Hacker News new | ask | show | jobs
by BonsaiDen 3743 days ago
One good way to the limit breakage in such cases is to solely perform black box tests on the API level. In case of our Node.js based Backends we don't ever write a single classical unit test, instead we have a custom Framework built on top of Mocha which performs tests on the HTTP layer against all of our endpoints.

This works remarkable well in practice and allows for large scale refactorings under the hood with little to no impact on the tests. We can also mock databases, memcached, redis and graylog on their respective http/tcp/udp level. This in turn means no custom build mocks which could break when refactoring. The tests itself also contain no logic, they are pretty much just chained method calls with data that should go in and an expected response that should come out, along with a specification of all external resource our API fetches during the request and their responses etc. Any unexpected outgoing HTTP request from our server will actually result in a test failure.

As for scaling this approach, from our experience it works quite well, especially when you have lots of complicated interactions with customer APIs during your requests since the flows are super quick to set up.

9 comments

That's the main reason I wrote this framework (mainly focused on Django for now, but able to much more than that):

http://hitchtest.com

I took this approach on several different projects, but I figured that a lot of the boilerplate/infrastructural code that you need to actually write these kinds of tests is poor or simply not available.

For example, declaratively starting and running multiple services together, in parallel and at the right time (with service dependencies) and printing their logs out.

Or, mocking the forward passage of time. (click something -> move forward a week -> click something else).

Or, asynchronously 'listening' using epoll for emails on a mock SMTP server that logs out the emails it receives.

Selenium's good for the web interaction stuff, but you need much much more than that to be able to effectively test at this level.

I think this kind of testing would be much more widely used and effective if the tools available were up to scratch.

I'd say do both. Unit tests often help me to make my code more readable/decoupled and also help to spot potential problems early on. They also act as a kind of abstract documentation for how things are supposed to behave/work. But functional/integration tests are what matters most in terms of being confident of deploying big changes because you can ensure that all the endpoints that are actually consumed by clients work as they should, but it doesn't help too much with spotting problems on the code level.
For smaller projects I definitely agree with having both unit and integration tests, especially for libraries. One thing to look out for is the fact that you can always "cheat" in unit tests, e.g. you can be "lazy" and set up some internal state directly in the test to skip huge amounts of initialization, this of course becomes a problem when there's no actual integration test for making sure that the exact state can also be triggered from the outside when using the API. In my experience, ensuring that these cases are always covered can become pretty complicated once the project grows.

In our special case we have about 100 different endpoints all versioned and all dependent on multiple endpoints from the (rather badly documented) APIs of our customer. Most of the work our API does is spent combining / enriching the customers data and performing integration across the subresources. Setting up individual mocks for every single on of these complex requests flows manually is pretty much impossible at this scale.

So doing black box testing and enforcing a 100% test coverage (best for avoiding dead code) helps keeping us sane. In the end we don't care so much about how the implementation behind our HTTP response looks as long as we return the correct data in the end. The code itself still has to look good though :)

yeah, well you just test what you own anyway, for other stuff you could just grab some kind of dummy response and use that for testing your manipulation of that data, but of course you have to trust your customers endpoints to return the data in the correct format because that is out of your control.
Trust, but verify.

We've always found that writing your own smoke tests for the other guy's code saves a lot of head scratching and the game I like to call Blame Tennis, when each side insists that any new problem must be in the other side's stuff because surely WE haven't broken anything.

These have become known in Pivotal as "frenemy tests". Poke a remote API to see that a) it is running b) it hasn't dropped a world-stopping change.

Typically run as a pre-build sanity check when you have remote integrations.

On some Labs projects we've written extensive request tests and stub services based on documentation, then handed those to the upstream service. Usually there is some angst at this point, being the first time that any kind of TDD suite has turned up to ask awkward questions.

I still maintain F5 owes me a job. I did QA for them for a year back in the dot-com boom. The first couple of versions of BigIP didn't really support session affinity, despite being the flagship feature. I think we filed something like 6-8 pretty big bugs, all in different parts of the code.

Sadly, despite not working, they were still about 4 years ahead of their competitors.

This is exactly why we created Newman (https://github.com/postmanlabs/newman/). Hitting every endpoint of our API gives enormous confidence when deploying it to production.

(Disclaimer: I work on Newman as a part of my day job)

Yeah, I mostly prefer end-to-end tests as well. Though to be fair, they are often slower than unit tests, because you need to start up the whole system. And they are worse at pinpointing problems, though that doesn't seem to be a big deal in practice.
I like to test the whole system via end-to-end tests, as they're the best bang for the buck. And then I'll create unit tests for more algorithmic code, like a parser, sort algorithm, shortest path calculator, financial calculations. Those also tend to require the least amount of test context setup, making them less painful to write.
I have almost always worked for customers who enjoyed changing their minds arbitrarily and often in ways they swore they would never do.

In the face of grossly changing requirements, I've never had much luck keeping E2E tests up and functioning properly. And people have a bad habit of investing more time and energy than a particular test is worth in trying to keep it working or porting it to the new requirements.

Unit tests are cheap. If the requirements change invalidates twenty of them, you just delete them and write new ones. Easy.

I in part agree with the pinpointing, though in my experience this really boils down to an issue of scope and how much of the data you want to test in each of your tests. E.g. an API returning user data, do you have one test for the whole set of data or one test per field.

For our use case we have some pretty "fancy" deep-equals logic for nested structures and allow to specific fields as "to be ignored" in our test expectations.

When it comes to speed, the most important part is to cut out everything you don't need, for us this means that our testing framework completely throws away the internals of Node.js HTTP layer. We never create a single TCP socket in our tests, which gives an incredible speed up. As a bonus this also allows to test timeouts and low level HTTP errors in < 1ms, since we can just return the timeout directly from the low level APIs and Node.js will invoke the timeout event on the http client handle.

> Running "noir:mock" (noir) task > ...................................................... > 1048 passing (10s)

And yes, that really reads 10 seconds and yes I always get a bit bored when I have to run the tests for one of our Django backends, feels like an eternity... :)

Count yourself lucky that your test takes 10 seconds. A partial build on my project takes ~3 minutes to compile + link. A full build is about 3 hours for everything.
I recently cut some link times from 3 minutes to 14 seconds by switching linkers (from bfd to gold doing Android dev.)

...sadly that's still only for a single project x config x arch combination. I still need to play around with incremental linking options that appear off by default...

I had a build that took 25 hours to run once. Never again. The only reason we got any work done is that we ran a 1, 3 and 19 hour part of it in parallel. But I really just wanted to throw out the long part (bad E2E tests) and start over.
> though that doesn't seem to be a big deal in practice

Gotta be careful there chief. In practice is usually refers to "your experience" (but might not be mine).

My experience is on the flip side: end-to-end tests are super slow due various reasons...

Functional tests are the best kind of tests. While unit tests are nice sanity checks when implementing tricky methods, when the rubber hits the road, you want to know if the whole app works as you expect it to.
Of your tests break when refactoring, something is wrong. Probably, you're testing the implementation not the behavior.
The behaviour of internal components is part of the implementation of the whole application.
In my experience, for any non-trivial application with a non-trivial number of tests, some percentage of those tests, particular the unit tests, will have by accident or carelessness come to rely on internal implementation details that aren't reflected in the result.

Those kinds of "atrophied" tests can make a refactor considerably more painful than it should be.

Yep. In other words, test the result, not the implementation.
This is a good idea, hence, expect hate from TDD purists
Can you say that API black box tests are strictly denoting "what" is broken, while unit and functional tests would tell you "where?"