Hacker News new | ask | show | jobs
by morelisp 1375 days ago
Protobuf isn't too complicated, I've found the wire format docs to be some of the best among the avro/msgpack/thrift/etc competitors.

Maybe you mean something besides the wire format. In that case, good luck, because that shit ain't protobuf.

2 comments

The wire format is fairly straightforward if you've seen a few binary encodings. The language used to write the schemas isn't quite as simple and regular as you might hope, though.

> Maybe you mean something besides the wire format. In that case, good luck, because that shit ain't protobuf.

Naming's hard :) Being really pedantic, I think even Google calls the schema description language "Protocol Buffers" and uses phrases like "the Protobuf binary format" or "the Protocol Buffer wire format" to refer to the wire format. Colloquially, it's never confused me to just use "Protobuf" for both.

Except I've written thousands of lines of protobuf format handling that never, or only extremely distantly, touch a schema file. But there's no reason you'd ever do the inverse, pushing protobuf schemas around with no intent to handle the wire format. As an abstract data definition format it's exceptionally poor, it only makes sense if you also want to use the wire format (which is... better than poor, especially as the commodity ones go).
> But there's no reason you'd ever do the inverse, pushing protobuf schemas around with no intent to handle the wire format.

You could be writing a linter, a formatter, an implementation of the Language Server Protocol, a compiler that's not protoc, a way to apply semantic patches to large numbers of Protobuf schemas, or any number of other useful tools. There's clearly at least some demand for tools like this - partial implementations of most of these exist, often with some corporate backing.

Unless you're implementing a Protobuf runtime (google.golang.org/protobuf in Go, upb for Python, etc.), your experience seems unusual to me - most developers I've encountered read and write the wire format using one of the existing runtimes.

That said, it does sound like a lot of fun - especially if it's in lisp!

The wire format isn't great compared to these, particularly in its handling of nested messages. Or rather, non-handling, since you have to encode the sub message to a bytestring. That makes encoders a bit slower (or complex), and you can't look at a message and see the nesting without a schema. Not great.
Or super-great, because decoders can't fail on nested messages they don't parse. This is a perennial problem in the large corpus of JSON we parse at my day job, and it would be fairly awesome to treat all nested objects as thunks to be decoded on access instead of crashing when synchronizing a crapton of JSON-encoded application state.
Funny, I have the opposite view. If you have invalid data, you should fail as early and loudly as possible, not wait until it's decoded later and you have no idea where the faulty data comes from.
I have both views - if you're going to do something with the data yes please reject as soon as possible - but also it sure is nice to be able to make a dumb pipe for either efficiency or architecture reasons sometimes.