Hacker News new | ask | show | jobs
by Fluxx 5168 days ago
In my opinion, having to replicate S3 in development and test isn't the best idea. There are a few problems I see: You have tied yourself to S3's API, you must maintain this "other" S3 by making sure it behaves like the real S3 and your test and development code never actually hits the real API you're using...until staging or production.

There are a few better strategies I can see here:

1. For test, use something like VCR[1] to record real HTTP interactions with the real S3 API during first test runs, serialize them to disk, and then replay them later.

2. Go the more OO route and create an internal business object with a defined interface that handles persistance of your objects. You could have a S3Persister for production and staging, but then you can create a LocalDiskPersister or even MemoryPersister for tests. Hell, you can even keep your own S3 and create OurS3Persister as well. The main point here is that your application code is coded to one API/interface - the "persister" - and you can easily swap in different persisters for different reasons. All the individual persisters can then have their own tests that guarantee they adhere to to Persister interface and do their own individual things correctly.

3. Mock out the calls to your S3 library. It's the job of the library to provide an API interface for you as the application developer to S3, so you can mock out those API calls and trust the library works and is doing the right thing. Since you're mocking things out, you should still have integration tests with the real S3 to verify everything is working, but for quick unit tests mocking works great.

The blog post mentioned they had GB of data, so YMMV on these ideas, but these are strategies I and others have used in the past when dealing with APIs like S3 and they work great.

[1] https://github.com/myronmarston/vcr

2 comments

Excellent points.

We work on the idea of different stages in the test and development pipeline. At different stages mock objects make sense, and at other stages having something like Fake S3 makes more sense.

For testing, the first stage would be unit testing. At that stage it is best to mock out your S3 interactions (with something like VCR or WebMock) and use an OO approach to wrap your persistence, so you could swap out S3 with another persistence engine without breaking APIs.

The second stage for us is integration testing where you might have multiple machines testing across the network. In this situation, I think it is great to have real network requests happening rather than mock requests. Also you can deal with real files (especially important with media files like images and video).

The last stage is taking out Fake S3 and using a true S3 connection to ensure that everything does work on a production environment (cuz Fake S3 could be faking you out, especially on things like authentication and versioning). We do that by launching a stage cluster and running a set of integration tests on that before doing a production release. Ideally, the first and second stages catch any errors before you start doing tests against the real AWS services.

As for the development pipeline, being able to work with real assets while you are making mobile or web interfaces is really useful, as well as simulating latency to see how interfaces respond when under a slow network connection is something that would be difficult to truly mock.

Awesome, thanks for the extra info. I think your setup sounds really good :)
it's fine to mock, but for serious s3 users you need to emulate the exact behavior or you are setting up your users for failure.
Isn't this just a method of implementation for your option 3? I don't really see a substantial difference between mocking the server and mocking the API.

The same caveat still applies about needing an integration test with the real S3 in either case.

> Isn't this just a method of implementation for your option 3? I don't really see a substantial difference between mocking the server and mocking the API.

The short answer is that there is no difference. Just as you could mock out a call to S3API.get(object_id) and have it returns my_object, you could write a server that responds to the S3 API call for getting object_id.

The long answer is that using mocks is a lot quicker to develop, easier, more straight forward and has faster run time than maintaining a real runnable copy of S3 that behaves the exact same as the real S3. With the fake S3 you're still spending CPU cycles inside your S3 client while it talks HTTP with your fake S3, which slows unit tests down a lot. Plus fake S3 may have slightly different behavior when your S3API library interacts with it, which could lead to really hard to track down bugs later on. Trusting your APIs is what unit testing is all about.

Well, there's at least one pretty major difference: you have persisted data. That might not be ideal for test, although in production you will already have persisted data so test with an empty store might not make sense either. But for a local development or a staging server, being able to recall the data you've stored across processes can be quite handy.