Hacker News new | ask | show | jobs
by jakjak123 670 days ago
Is it a protobuf based database store, or is it a database that uses grpc as its connection communication? Could be a bit clearer from the frontpage
1 comments

> protobuf

such an awful format, I wish people would stop using it

I think it has many good parts. There are waaaaay worse formats I have worked with over the years. Like COM, java RMI, many variants of SOAP, handcrafted json in countless variants…
Why?

The format is not that bad.

The binding/libraries OTOH are often awful and they often require unnecessary full-message deserialization

not self describing. if it was just the field names I could deal with that, but even the values are ambiguous, since the same type is used for Bytes and embedded Messages. the worst part is the wire type integer has two unused values, so they easily could have added a wire type for embedded messages
Self-describing is point-less for serializations. There is a great deal of history here. ASN.1 has self-describing encoding rules such as BER/DER/CER, XER (XML), JER (JSON), and GSER (never mind), and it has non-self-describing serializations like PER (packed encoring rules) and OER (octet encoding rules). XML and JSON are self-describing, naturally. FastInfoSet is a PER-based fast encoding for XML, because it turns out that XML is slow (imagine that). XDR is a non-self-describing serialization format that resembles OER but with 4-octet alignment. Flat buffers is essentially an OER-ish encoding for the same IDL as protobufs, and is much better than protobufs.

It would be nice if the next serialization format either is truly original or just solves problems that somehow none of the many existing schemes do.

How many serialization formats are there? See: https://en.wikipedia.org/wiki/Comparison_of_data-serializati... (which is NOT a complete list).

> Self-describing is point-less for serializations

you couldn't be more wrong. what happens when you lose the schema, or never had access to it in the first place? think from the point of view of reverse engineering

I imagine when someone is choosing a serialization solution they probably don't really care about people trying to reverse engineer it... And if they did, they would just make schemas available instead.

And if you lose your own schema, then you probably have more serious underlying problems.

Why would you lose it? NFS never lost its XDR schema, for example. Do you have any examples where the schema got lost?
Protobuf is a tag-length-value (TLV) encoding. It's bad. TLV is the thing that everyone loves to hate about ASN.1's DER.
> It's bad

I'm not sure that's a helpful way to look at things.

There are tradeoffs. Can you elaborate more about what aspects of a TLV encoding you find problematic? Is it decoding speed? The need to copy the encoded value into a native value in order to make use of it? Something else?

TLV encodings are always redundant and take more space than non-TLV encodings. Therefore they are a pessimization. As well, definite-length TLV encodings require two passes to encode the data (one to compute the length of the data to be encoded, and one to encode it), thus they are a) slower than non-TLV binary encodings, b) not on-line for encoding.
Yes all those are indeed pessimizations during encoding but are features when decoding: the decoder can skip decoding fields and tolerate unknown fields.

Now, you may disagree that tolerating unknown fields is a features (as many people do), but one must understand the context where protobuf has been designed, namely the situation where it takes time to roll out new versions of binaries that process the data (either in API calls or on stored files) and thus the ability to design a schema evolution with backward and forward compatibility is worth a few more cycles during encoding.

Not all users have that need and hence there exist other formats, but I wouldn't dismiss the protobuf encoding as flatly "wrong" just because you don't have the requirements it has been designed for.

What's an alternative you would recommend?
Anything prefix length encoded, with no schema.
How do you handle schema instead?