| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hansvm 549 days ago
	> efficiency State of the art for both gzipped json and protobufs is a few GB/s. Details matter (big strings, arrays, and binary data will push protos to 2x-10x faster in typical cases), but it's not the kind of landslide victory you'd get from a proper binary protocol. There isn't much need to feel like you're missing out.

3 comments

jlouis 548 days ago

The big problem with Gzipped JSON is that once unzipped, it's gigantic. And you have to parse everything, even if you just need a few values. Just the memory bottleneck of having to munch through a string in JSON is going to slow down your parser by a ton. In contrast, a string in Protobuf is length-encoded.

5-10x is not uncommon, and that's kissing an order of magnitude difference.

link

hansvm 548 days ago

> have to parse everything, even for just a few values

That's true of protobufs as much as it is for json, except for skipping over large submessages.

> memory bottleneck

Interestingly, JSON, gzipped JSON, and protobufs are all core-bound parsing operations. The culprit is, mostly, a huge data dependency baked into the spec. You can unlock another multiplicative 10x-30x just with a better binary protocol.

> 5-10x is not uncommon

I think that's in line with what I said. You typically see 2x-10x, sometimes more (arrays of floats, when serialized using the faster of many equivalent protobuf wire encodings, are pathologically better for protos than gzipped JSON), sometimes less. They were aware of and worried about some sort of massive perf impact and choosing to avoid protos anyway for developer ergonomics, so I chimed in with some typical perf numbers. It's better (perf-wise) than writing a backend in Python, but you'll probably still be able to measure the impact in real dollars if you have 100k+ QPS.

link

imtringued 548 days ago

Yeah this is something people don't seem to want to get into their heads. If all you care is minimizing transferred bytes, then gzip+JSON is actually surprisingly competitive, to the point where you probably shouldn't even bother with anything else.

Meanwhile if you care about parsing speed, there is MessagePack and CBOR.

If any form of parsing is too expensive for you, you're better off with FlatBuffers and capnproto.

Finally there is the holy grail: Use JIT compilation to generate "serialization" and "deserialization" code at runtime through schema negotiation, whenever you create a long lived connection. Since your protocol is unique for every (origin, destination) architecture+schema tuple, you can in theory write out the data in a way that the target machine can directly interpret as memory after sanity checking the pointers. This could beat JSON, MessagePack, CBOR, FlatBuffers and capnproto in a single "protocol".

And then there is protobuf/grpc, which seems to be in this weird place, where it is not particularly good at anything.

link

lowbloodsugar 549 days ago

Except gzip is tragically slow, so crippling protobuf by running it through gzip could indeed slow it down to json speeds.

link

hansvm 549 days ago

"gzipped json" vs "protobuf"

link

lowbloodsugar 549 days ago

Then something is very wrong.

link

hansvm 548 days ago

Protobufs have a massive data dependency baked into the wire format, turning parsing into an intensive core-bound problem.

Interestingly, they're not usually smaller than gzipped JSON either (the compression it has built-in is pretty rudimentary), so if you don't compress it and don't have a stellar network you might actually pay more for the total transfer+decode than gzipped JSON, despite usually being somewhat faster to parse.

link

lowbloodsugar 548 days ago

Got any references to share?

link

hansvm 547 days ago

The docs [0] are fairly straightforward. I'll spit out a little extra data and a few other links in case it's helpful. If this is too much or not enough text, feel free to ask followup questions.

As far as data dependencies are concerned, you simply can't parse a byte till you've parsed all the preceding bytes at the same level in a message.

A naive implementation would (a) varint decode at an offset, (b) extract the tag type and field index, (c) use that to parse the remaining data for that field, (c1) the exact point in time you recurse for submessages doesn't matter much, but you'll have to eventually, (d) skip forward the length of the field you parsed, (e) if not done then go back to (a).

You can do better, but not much better, because the varints in question are 8 bytes, requiring up to 10 bytes on the wire, meaning AVX2 SIMD shenanigans can only guarantee that you parse 3 varints at a time. That's fine and dandy, except most fields look like 2 varints followed by some binary data, so all you're really saying is that you can only parse one field at a time and still have to skip forward an unpredictable amount after a very short number of bytes/instructions.

If you have more specialized data (e.g., you predict that all field indexes are under 32 and all fields are of type "LENGTH"), then there are some tricks you can do to speed it up a bit further. Doing so adds branches to code which is already very branchy and data-dependent though, so it's pretty easy to accidentally slow down parsing in the process.

Something close to the SOTA for varint decoding (a sub-component of protobuf parsing) is here [1]. It's quite fast (5-10 GB/s), but it relies on several properties that don't actually apply to the protobuf wire format, including that their varints are far too small and they're all consecutively concatenated. The SOTA for protobuf parsing is much slower (except for the sub-portions that are straight memcopies -- giant slices of raw data are fairly efficient in protos and not in JSON).

This isn't the best resource [2], but it's one of many similar examples showing people not finding protos substantially faster in the wild, partly because their protos were bigger than their json objects (and they weren't even gzipping -- the difference there likely comes from the tag+length prefix structure being more expensive than delimiters, combined fixed-width types favoring json when the inputs are small). AFAICT, their json library isn't even simdjson (or similar), which ought to skew against protos even further if you're comparing optimal implementations.

In terms of protos being larger than gzipped json, that's just an expected result for almost all real-world data. Protobuf adds overhead to every field, byte-compresses some integers, doesn't compress anything else, and doesn't bit-compress anything. Even if your devs know not to use varint fields for data you expect to be negative any fraction of the time, know to use packed arrays, ..., the ceiling on the format (from a compression standpoint) is very low unless your data is mostly large binary blobs that you can compress before storing in the protobuf itself.

For a few other random interblags comparisons, see [3], [4]. The first finds protos 3x-6x faster (better for deserializing than serializing) compared to json. The second finds that protos compress better than json, but also that compressed json is much smaller than ordinary protos for documents more than a few hundred bytes (so to achieve the size improvements you do have to "cripple" protos by compressing them).

If you start looking at the comparisons people have done between the two, you'll find results largely consistent with what I've been saying: (1) Protos are 2x-10x faster for normal data, (2) protos are usually larger than gzipped json, (3) protos are sometimes slower than gzipped JSON, (4) when you factor in sub-par networks, the total transfer+decode time can be much worse for protos because of them being larger.

As a fun experiment, try optimizing two different programs. Both operate on 1MB of pseudo-random bytes no greater than 10. Pick any cheap operation (to prevent the compiler from optimizing the iteration away) like a rolling product mod 256, and apply that to the data. For the first program (simulating a simplified version of the protobuf wire format), treat the first byte as a length and the next "length" bytes as data, iterating till you're done. For the second, treat all bytes as data. Using a system's language on any modern CPU, you'll be hard-pressed to get an optimized version of the length-prefixed code even as fast as 10x slower than an un-optimized version of the raw data experiment.

Cap'n proto and flatbuffers (whether gzipped or not), as examples, are usually much faster than both JSON and protobufs -- especially for serialization, and to a lesser extent deserialization -- even when you're parsing the entire message (they shine comparatively even more if you're extracting sub-components of a message). One of them was made by the original inventor/lead-dev of the protobuf team, and he learned from some of his mistakes. "Proper" binary formats (like those, though they're by no means the only options) take into account data dependencies and other features of real hardware and are much closer to being limited by RAM bandwidth instead of CPU cycles.

[0] https://protobuf.dev/programming-guides/encoding/

[1] https://www.bazhenov.me/posts/rust-stream-vbyte-varint-decod...

[2] https://medium.com/@kn2414e/is-protocol-buffers-protobuf-rea...

[3] https://medium.com/streamdal/protobuf-vs-json-for-your-event...

[4] https://nilsmagnus.github.io/post/proto-json-sizes/

link