Hacker News new | ask | show | jobs
by jbeda 3926 days ago
Took a quick look at the API. For context, I was involved in the early days of Google Cloud Storage.

It is surprising that they didn't make it compatible with the S3 API -- at least for common object/bucket create/delete. This will require more code to be written and it will be harder to adapt client libraries.

The API documentation is here: https://www.backblaze.com/b2/docs/

Other notes:

* The lack of scalable front-end load balancing is shown by the fact that they require users to first make an API call to get an upload URL followed by doing the actual upload.

* They require a SHA1 hash when uploading objects. This is probably overkill over a cheaper CRC. In addition, it means that users have to make 2 passes to upload -- first to compute the hash and then another to upload. This can slow uploads of large objects dramatically. A better method is to allow users to omit the hash and return it in the upload response. Then compare that response with a hash computed while uploading. In the rare case that the object was corrupted in transit, delete/retry. GCS docs here: https://cloud.google.com/storage/docs/gsutil/commands/cp#che...

2 comments

> It is surprising that they didn't make it compatible with the S3 API .... The lack of scalable front-end load balancing is shown by the fact that they require users to first make an API call to get an upload URL

And... you answered your own question. :-) We reduce our operating costs by not having as many load balancers in the datacenter and pushing off the responsibility to the API. It all comes from our traditional backup product where we wrote all the software on both sides so we could save money this way.

With that said, we are actively considering offering an S3 compatible API for a slightly higher cost (basically what it would cost us to deploy the larger load balancing tech).

I, for one, prefer the directness of not having to go through a front-end proxy. It probably eliminates some failure modes. I think that, instead of Backblaze providing an S3-compatible API, someone should do an open-source S3-compatible front-end for B2, that any interested user can run on a cheap VPS.
I work at https://kloudless.com. While not S3-compatible, we offer a similar proxy that provides a single API to multiple storage services such as Dropbox, Box, Google Drive, SharePoint, etc. We've also released support for S3 and Azure and are looking into B2.
If someone writes a jclouds provider for B2, you can then use https://github.com/andrewgaul/s3proxy to interface with it
S3 compatible API is the only reason preventing us from migrating to your cheaper option right now.
Wouldn't be very hard to make your own proxy.
Famous last words :)
> They require a SHA1 hash when uploading objects. This is probably overkill over a cheaper CRC.

Having been on the receiving end of entirely too many corrupted files in my life, I strongly approve of their use of a hash that's been standardized and fast for decades and remains cryptographically strong. "But fast" if you fail to store it isn't very helpful. TCP has a CRC too. We're wallpapering over it with better ones and everyone serious has been for years: it's time to accept that cheap CRCs aren't a good place to get stuck.

Improving the API to avoid the 2-pass problem is spot-on though. Another possible solution is to require either a subsequent API call, or format the first message as a multipart, and use that route to have the caller submit the hash that's used to confirm and commit the file to storage after the body upload. This would solve the 2pass problem while still ensuring the client is actually doing the integrity check -- and since Backblaze is more than likely to take the heat on any corruption issues, it's probably a good policy for them to make sure lazy client implementations aren't going to cause problems that their storage then gets the publicity smear for.

Using a hash or CRC here is totally necessary. Often times CRCs in TCP fail due to corruption outside the network stack. Having an end to end check will catch, say, memory bit flips and such after data comes off the wire.

But there is no call for a cryptographic hash here. This isn't being used as any sort of ID or to verify integrity outside of corruption.

No, it's pretty much totally unnecessary.

The API works on top of TLS, which already includes cryptographic authentication of all data (usually via SHA-1/2 HMAC or AES-GCM).

The hash would be computed at the client right after reading from disk and right before TLS enryption, and since they seem to terminate TLS at the storage server it would be computed right after TLS decryption and right before storage, so it doesn't seem to provide any gain.

I think they should just remove it, or at least make it optional.

When operating at scale, you will, once in a while, have corruption. Even if you use ECC RAM, once in a while you'll have a double bit flip. And it doesn't look like Backblaze uses ECC (https://www.backblaze.com/blog/storage-pod-4-5-tweaking-a-pr...) despite good evidence that ECC is necessary (PDF: http://static.googleusercontent.com/media/research.google.co...). Even if you do have ECC, you'll once in a while have a bad NIC that with HW offload that will corrupt the TCP stream silently.

This is all rare, but it does happen. This is why the GCS team wants to know if you are seeing corruption on file upload as it might be some bad hardware failing in a non-obvious way.

I just spent 10 or so minutes and it looks like they do use ECC, and per https://news.ycombinator.com/item?id=2786695 see ECC corrections reported in their log files.
As jbeda mentions, hardware errors are one big reason: with the scale S3/Azure/GCS/Backblaze operate at it's a matter of when and not if you're going to run into problems. Also: TLS may guarantee the bits your client sends are the ones their server receives, but that's just one cause of errors.

There's the write path from B2 receives your bits to when they're stored on disk, for one. You could have unforeseen bugs in the code sitting on the other end of their upload URL (it's probably not all theirs, and even if it was it was written by human developers).

Or B2's internal network path (if they have any) between that and the disk. Ideally that would provide integrity too, but maybe not. They offer a low price point and call out other compromises they make to achieve it (e.g. limited load balancing) - so while I really doubt it, it's remotely plausible they deem the internal overhead of SSL too high.

But then there's the potential for mismatch between "what the customer thinks they uploaded" and "what the customer actually uploaded" too! Less of an issue for now because their API only appears to support uploading files all at once, but eventually I'm sure they'll support a multipart upload scheme like the other platforms do. At which point uploads become more complicated since clients need to retain state and potentially resume. What if a client screws it up and there's some off-by-one error (or whatever)? If you can provide instant feedback, at upload time, that your clients provided bogus data, that's a good thing.

You can argue it's a painful requirement to force on users since it means they have to track/compute it themselves (might be nontrivial for streaming applications), which is fair. But there are enough points of failure, and the numbers so large, that errors happening is a fact and you really need to insure against it. Especially here, your entire reason for existing is to reliably store bits so it's kinda important to get it provably right.

It seems completely sensible to err on the side of caution, especially as a new and relatively unproven platform (as an object storage platform provider I mean, obviously they have tons of experience storing things).

There are many places in your stack where data corruption can and will occur. You are correct that TLS provides payload integrity on a per-packet basis - but it doesn't protect you against silent truncation (to fight this, always declare and check content-length, or use chunked encoding). I have seen corruption occur in NIC buffers, ECC'd main memory, Xen MMU'd memory pages (yes, Xen was responsible), and multiple places in HTTP server and client stacks. None of those failures manifested until hundreds of terabytes of data had successfully gone through the system.

If you're handling data on behalf of others, it's paramount that you checksum data end-to-end. Amazon S3 allows you to do this by sending the MD5 or SHA along with the data. Google GCE allows you to do this with CRCs (which, despite what others in this thread say, are more appropriate for the task than crypto hashes, as long as you use enough bits).

A cryptographic hash is pretty much as fast as anything else, and lets you be certain. There's no good reason to use anything else.
Apparently, SHA-1 is pretty slow compared to others, about 20x slower than the fastest hash algorithms out there.

https://github.com/Cyan4973/xxHash

You would think, that if it's just being used as a checksum, anything that passes https://code.google.com/p/smhasher/wiki/SMHasher with high marks would be sufficient.

Why would you want 'just' a checksum? I want something I can rely on. If I have to dedicate half a core per gbps of internet-crossing upload, that's not a big deal.
The purpose here is not to secure your data against an attacker (that's what TLS is for), or even against errors in transmission (as others have noted, TLS has you covered there as well) - you need something simple and inexpensive to secure against errors in hardware/memory before/after it enters that pipeline. While you shouldn't under-solve a problem, there are real costs to over-solving the problem as well.
The CRC in TCP is not powerful enough, but CRCs can be adjusted to be arbitrarily powerful. The main advantage of CRCs is that they can be independently computed for multiple parts and combined when concatenating the parts.
The CRC in TCP is weak. CRCs can be made arbitrarily powerful and still retain their key advantage: they can be computed independently and then combined when concatenating the parts. That key advantage, combined with a strong CRC, is what provides an elegant solution to the 2-pass problem.