| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jrockway 2443 days ago

Your integration test needed the system that you were integrating with, so you'd have to declare that as a dependency.

My philosophy was to always have integration tests run in the normal CI system. This basically meant creating a test binary that happened to link in the systems you were integrating with, and run tests against that. This is easier when everything is written in the same programming language, and for the cases where it wasn't, I was usually happy with "fakes". (https://testing.googleblog.com/2013/06/testing-on-toilet-fak...)

Other teams really loved the sandbox environment with live instances of everything. They would have some machinery outside the standard CI system to inject their code into this sandbox and run some tests, as well as machinery for keeping their sandbox up to date with production. (And adding test data, etc., etc., which all becomes very complex very quickly.)

Both methodologies have their downsides and upsides.

I generally prefer simplicity and speed; people should be able to run the tests on their workstation 100% of the time without having to set up any external resources. If you have an integration test binary that is built from the build system, this is possible. The downside is that config changes in production can break your system; since you are starting up your own instance of some other team's server, they could theoretically make some config change that breaks your integration. Even if you include their configuration in your in-memory version of their service, there was no guarantee that what is running in production is actually checked in yet. (Debugging in production, emergency rollback to an older prebuilt binary, etc.) These were rare and never caused me problems, however, and not having machinery to maintain a shadow environment meant it was easier to work on the code.

Having a sandbox environment was good because you could "check" (not test) big changes before putting them into production. You could try out your flag flip, database migration, mapreduce, or just load up the website in your browser and send your coworkers a link without affecting production data. And you could test your actual production binary in production-like conditions; as long as you sync'd production changes to your sandbox, your automated test probably ran against something that was very much like production. This let you check for more subtle things like performance regressions before deploying. (I worked on a system to do just that.)

The main problem I had with this method was that it was maintenance-intensive (big teams that used this had entire teams just to maintain the sandbox, and that begat sub teams that maintained the sandbox maintenance) and slow. Building and running another test during CI was relatively fast, but starting up a job in production and scaling it up was significantly slower. This meant that you needed a parallel set of tools to run some subset of this environment locally, and it was always painful. Not having your tests in the standard system meant that downstream dependencies wouldn't see test failures in your system when you made a change, so the "buildcop" would have to detect and fix that.

I found this to be too much overhead, but it is probably necessary when you are developing, say, a mobile application. You will have to write some sort of software to make it possible to try your in-progress code on your personal phone. You will probably want to be able to share links with coworkers. I generally like to push changes to production multiple times a day, and make sure that clients can handle a newer server and still work correctly. This way, as soon as a build passes tests, you can start giving it, say 0.1% of production traffic and keep an eye on the error rates, and promote that to production as quickly as possible. The biggest problem I've run into with this strategy is that 0.1% of Google's traffic is way more than enough for a good canary, but at other places I've worked... 0.1% of traffic might be one request over several days. In that case, you have to have staging and manually bug people to try it out. Sometimes I wonder if that kind of software is worth writing at all, to be perfectly honest. If you get one request a day, maybe just make it open a support ticket, and hire 2 support engineers instead of one software engineer. But I digress ;)

2 comments

strbean 2442 days ago

Tangentially:

I've seen several blog posts from Google about using fakes and 'hermetic servers' for testing. We use GCP for our product, and unfortunately, Google doesn't seem to care much about making this easy. For example, I think I saw only one or two languages for which the Google Storage client libraries provided "fakes" of a Google Storage server. For PubSub (and maybe one or two other services?) there is the PubSub Emulator, which is unfortunately in Java and isn't supported by any of the CLI tools.

For all their love of fakes and hermetic servers, it would be awesome if they provided them for all the GCP services.

link

w-m 2442 days ago

Wow, thanks for the detailed reply. You mentioned a couple of implementations that I hadn't thought about. But I guess the short version would be, as so often: testing systems is hard, and there's no one-fits-all solution.

link