|
|
|
|
|
by zheng
4664 days ago
|
|
The claims on this site are pretty impressive, but I have close to zero knowledge of the history here, so can someone comment on how many grains of salt this should be taken with? Otherwise, this looks pretty cool. Something that beats protobufs in overall speed could be really helpful depending on the application. |
|
Capn Proto's key design characteristic is to use the same encoding on-the-wire as in-memory. Protobufs have a wire format that looks something like:
The fieldnum/value pairs can come in any order, and may define as many or as few of the declared fields as are present. This serialization format doesn't work for in-memory usage because for general programming you need O(1) access to each value, so protobufs have a "parse" step that unpacks this into a C++ class where each field has its own member.Protobufs are heavily optimized so this parsing is fast, but it's still a very noticeable cost in high-volume systems. So Capn Proto defines its wire format such that it also has O(1) access to arbitrary fields. This makes it suitable as an in-memory format also.
While this avoids a parsing step, it also means that your wire format has to preserve the empty spaces for fields that aren't present. So to get the "infinitely faster" advantage, you have to accept this cost. For dense messages, this can actually be smaller than the comparable protobuf because you don't have to encode the field numbers. But for very sparse messages, this can be arbitrarily larger.
As Kenton points out on http://kentonv.github.io/capnproto/encoding.html , lots of zeros compress really well, so even sparse messages can become really small by compressing them. To do this you lose "infinitely faster", but according to Kenton this is still faster than protobufs.
In both cases though, the tight coupling between the (uncompressed) wire format and the in-memory format imposes certain things on your application with regards to memory management and the mutation patterns the struct will allow. For example, it appears that the in-memory format was not sufficiently flexible for Python to wrap it directly, so the Python extension does in fact have a parse step.
Other cases where you could need a parse/serialize step anyway: if you want to put the wire data into a specialized container like a map or set (or your own custom data classes), or if the supported built-in mutation patterns are not flexible enough for you (for example, the Capn Proto "List" type appears to have limitations on how and when a list can grow in size).
It's very cool work, but I don't believe it obsoletes Protocol Buffers. I'm actually interested in making the two interoperate, along with JSON -- these key/value technologies are so similar in concept and usage that I think it's unfortunate they don't interoperate better.