Hacker News new | ask | show | jobs
by dkopi 3616 days ago
> Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

From https://developers.google.com/protocol-buffers/

1 comments

Yes, I read this. It tells me what Protocol Buffers are. Faster, Smaller XML like data structures for serialisation. What are the most common use cases though? And do people only use them for performance reasons?
The most common use cases line up with those of JSON: communication between programs that don't share an address space. The main advantage over JSON (in my opinion) is the definition of an explicit schema. The second (and also important) advantage is in the efficient size of the serialized data, which limits memory, disk, and bandwidth usage. Another (less important to me) advantage is in serialization and deserialization efficiency. A disadvantage is that it requires deserialization for human inspection - that is, it isn't plain text like JSON or XML.

It is similar to Apache Thrift, if you're looking for a non-Google project with similar ideas.

Serialization and deserialization efficiency is specially important for mobile apps, in which the CPU used to parse/serialize JSON (or gzipped JSON) can become very prominent.

Apache Thrift, IIRC, is actually a reimplementation of protos, in the same way that Facebook's Buck is of Google's Bazel.

I have some times looked at "raw" binary protos to inspect the string fields, that happen(ed?) to be byte-aligned and so readable in a text editor. Not sure off the top of my head if that's always the case.

Performance is a nice benefit, but the standardization of message passing is by far the biggest benefit in my opinion. Within a given language, I know that any API I call will have certain unvarying semantics, I can see a highly readable yet formal spec of the data being exchanged, and the code to manipulate these messages will always be familiar and idiomatic.

Duplicating these benefits with XML or JSON would require defining your own grammar and parser, but wouldn't have the performance benefits. Recreating the performance gains would require a new serialization scheme, at which point you'd have broken from JSON and XML standard tools and recreated protobufs in everything but the proto definition language; at that point, why not create a DSL rather than bolting this functionality into an existing one?

In addition to smaller/faster than XML, protobufs make it extremely easy to declare the schema of data, validate data and version your schema. Then the generated wrappers and static type checking in various languages add additional guarantees that you're using the data correctly.

Plain XML still requires a lot to ensure compatibility when it's used across multiple places, protobufs attempt to minimize many sources of the incompatibilities.

Add in a bunch of tools such as protobuf->JSON, protobuf plaintext serialization, etc and it becomes more difficult to argue for using something such as XML or vanilla JSON.

Flatbuffers are still a nice solution for more performance-critical applications.

As an example, they power Google's RPC system (usually referred to as Stubby in the literature, recently open sourced as gRPC http://www.grpc.io/)
gRPC is based on Stubby: http://www.grpc.io/posts/principles
Stubby is based on protobuf; where by "based" I mean layered on, i.e. protobuf is the encoding used by stubby to encode requests and responses.

gRPC is a reimplementation of Stubby suitable to be used outside of Google.

Yes, I think you are just using a different sense of "based on" than I am. gRPC is based on Stubby in the sense that it is influenced by the design of Stubby and uses the knowledge learned from creating Stubby.
RPC. Streaming.