Hacker News new | ask | show | jobs
by jhumphries131 1375 days ago
The intent of this spec is to actually put "whatever Google did in protoc" into a readable format, so you don't have to read the C++ code. The official docs fall short on providing much of the details that are included.
2 comments

That sounds great, but what's the governance story? Are the authors of the spec document committing to keeping the document up to date here henceforth? Are the protoc maintainers committing to have these folks involved in project direction decisions?
As of right now, the former. That is currently required for our tools to remain functioning (https://buf.build). If/when changes are made to the language, we update our tools (and this spec) to continue to be accurate.
Protobuf isn't too complicated, I've found the wire format docs to be some of the best among the avro/msgpack/thrift/etc competitors.

Maybe you mean something besides the wire format. In that case, good luck, because that shit ain't protobuf.

The wire format is fairly straightforward if you've seen a few binary encodings. The language used to write the schemas isn't quite as simple and regular as you might hope, though.

> Maybe you mean something besides the wire format. In that case, good luck, because that shit ain't protobuf.

Naming's hard :) Being really pedantic, I think even Google calls the schema description language "Protocol Buffers" and uses phrases like "the Protobuf binary format" or "the Protocol Buffer wire format" to refer to the wire format. Colloquially, it's never confused me to just use "Protobuf" for both.

Except I've written thousands of lines of protobuf format handling that never, or only extremely distantly, touch a schema file. But there's no reason you'd ever do the inverse, pushing protobuf schemas around with no intent to handle the wire format. As an abstract data definition format it's exceptionally poor, it only makes sense if you also want to use the wire format (which is... better than poor, especially as the commodity ones go).
> But there's no reason you'd ever do the inverse, pushing protobuf schemas around with no intent to handle the wire format.

You could be writing a linter, a formatter, an implementation of the Language Server Protocol, a compiler that's not protoc, a way to apply semantic patches to large numbers of Protobuf schemas, or any number of other useful tools. There's clearly at least some demand for tools like this - partial implementations of most of these exist, often with some corporate backing.

Unless you're implementing a Protobuf runtime (google.golang.org/protobuf in Go, upb for Python, etc.), your experience seems unusual to me - most developers I've encountered read and write the wire format using one of the existing runtimes.

That said, it does sound like a lot of fun - especially if it's in lisp!

The wire format isn't great compared to these, particularly in its handling of nested messages. Or rather, non-handling, since you have to encode the sub message to a bytestring. That makes encoders a bit slower (or complex), and you can't look at a message and see the nesting without a schema. Not great.
Or super-great, because decoders can't fail on nested messages they don't parse. This is a perennial problem in the large corpus of JSON we parse at my day job, and it would be fairly awesome to treat all nested objects as thunks to be decoded on access instead of crashing when synchronizing a crapton of JSON-encoded application state.
Funny, I have the opposite view. If you have invalid data, you should fail as early and loudly as possible, not wait until it's decoded later and you have no idea where the faulty data comes from.
I have both views - if you're going to do something with the data yes please reject as soon as possible - but also it sure is nice to be able to make a dumb pipe for either efficiency or architecture reasons sometimes.