|
|
|
|
|
by jiggawatts
1418 days ago
|
|
Something that’s always bugged me about streaming protocols of this type is that they prevent processing pipelining. If trailers are used for things such as checksums, then the client must wait patiently for potentially gigabytes of data to stream to it before it can verify the data integrity and start processing it safely. If the data is sent chunked, then this is not an issue. The client can start decoding chunks as they arrive, each one with a separate checksum. |
|
1. When transferring large amounts of data, the checksum for the full transfer can't be verified until all the data is received. If you want to (for example) download an Ubuntu ISO and verify its checksum before installing it, you'll have to buffer the data somewhere until the download finishes.
2. When transferring small amounts of data, such as individual chunks, the data integrity is (/ should be) automatically verified by the encryption layer[0] of the underlying transport. There's no point in putting a shasum into each chunk because if a bit gets flipped in transit then that chunk will never even arrive in your message handler.
3. In gRPC, chunking large data transfers is mandatory because the library will reject Protobuf messages larger than a few megabytes[1]. As the chunks of data arrive, they can be passed into a hash function at the same time as you're buffering them to disk.
[0] gRPC supports running without encryption for local development, but obviously for real workloads you'd do end-to-end TLS.
[1] IIRC the default for C++ and Go implementations of gRPC is 4 MiB, which can be overridden when the client/server is being initialized. For bulk data transfer there's also the Protobuf hard limit of 2GB[2] for variable-length data.
[2] https://developers.google.com/protocol-buffers/docs/encoding