Hacker News new | ask | show | jobs
by kentonv 1659 days ago
Note that Buf is building tooling around Protobuf/gRPC, not replacing them.

Maybe I'm just another clueless millennial developer who doesn't understand the history of the 80's or whatever, but I've never been able to understand this claim that RPC is broken. There's a lot of assertions that everyone knows RPC was broken because smart people in the 80's said so but... no one has ever been able to give me a concrete reason why.

RPC, at least as I've always known it, really just boils down to request/response protocols. You send a request, you get a response. While this is admittedly not the only possible networking pattern, it is the dominant one across almost all distributed systems I've worked with. HTTP itself is request/response -- it's basically the same thing.

All gRPC and Protobuf are doing that is different from HTTP is they're using a binary protocol based on explicitly-defined schemas. The protobuf compiler will take your schemas and generate convenient framework code for you, so you don't have to waste your time on boilerplate HTTP and parsing. And the binary encoding is faster and more compact than text-based encoding like JSON or XML. But this is all convenience and optimization, not fundamentally different on a conceptual level.

Neither HTTP nor RPC protocols have ever pretended to solve higher-level distributed systems concerns of fault tolerance, network partitions, reliable delivery, etc. Those are things you build on top. You need a basic mechanism to send messages before you can do any of that.

What, exactly, is the magical non-RPC approach of the 80's that we're all missing? Can you explain the alternative?

EDIT: Also, like, the entirety of Google is built out of services that RPC to each other, but the 80's called and said that's wrong? How am I supposed to take this seriously?

3 comments

Perhaps ChuckMcM means the idea of completely transparent RPC? Treating RPC as something you can throw into a program and have it become distributed without designing for it. There seemed to be a time when it that was an expectation. gRPC still needs a lot of extra scaffolding to make a real distributed system.
the term "RPC" can be unpacked with a lot of bad baggage. "remote" can imply unnecessary coupling of knowing the exact receiver. "procedure" can imply a hard guarantee that some side effect took place by the time you get a response.

"non-RPC", as best I can interpret it, means "broadcasting" useful messages / FYIs without much out-of-band coupling and listening for interesting messages. You don't know who's gonna receive the message, what they'll do with it, "when" they'll act on it.

RPC is inspired by "procedure call" on a single CPU, which is the complete opposite. in a "procedure call" you know exactly the implementation you're gonna get, when it will be executed, etc.

You can find glimpses of this in lots of companies, when there's heavy use of a message bus like Kafka. Protobufs as "messages" instead of mere procedure "call" arguments.

What do you think?

The “promise” of RPC has been that you’re calling a procedure from your code that happens to be on a different piece of equipment. It may be in a completely different memory space and on completely different hardware. So “seamless distributed computing.”

The basic premise being that you specify the interface and you can use tooling to build some skeleton code that makes the code the user writes look like any other code they write, and yet it might magically be running on half a dozen machines.

Of course the actual difference between invoking a “procedure call” which is simply a program counter change and the same stack you had before, and one where the parameters provided are marshaled into a canonical form so that at the destination you can reliably unmarshal them and correctly interpret them, where the step that had been done by the linker resolving one symbol in your binary is now an active agent that is using yet another protocol at the start of execution to resolve the symbols and plumb the necessary networking code. And the execution itself which may happen exactly as expected, or happen multiple times without you knowing it has done so, or might not happen at all.

The minimalist camp, of which I consider myself a member, says “No, you can’t make these seamless, they really are just syntactic sugar that lets you specify a network protocol.” In that simple world you acknowledge that, and plan for, any part of the process to fail. Your code had failure checks and exceptions that deal with “at most once” or “at least once” semantics, you write functions rather than procedures to be idempotent when you can to minimize the penalty of trying to maintain the illusion of procedure call semantics in what is in fact a network protocol implementation.

But there is another camp, and from the material Buf has put out they seem to be in that camp, which is “networking is hard and complicated, but we can make it so that developers don’t need to even know they are going over a network. Just use these tools to describe what you want to do and we’ll do all the rest.”

My experience is that obfuscating what is going on under the hood to lower the cognitive load on developers breaks down when trying to distribute systems. That is especially true for languages that don’t explicitly allow for it. The number of projects/ideas/companies that have crashed on that reef are numerous.

And there is this part : “All gRPC and Protobuf are doing that is different from HTTP is they're using a binary protocol based on explicitly-defined schemas. The protobuf compiler will take your schemas and generate convenient framework code for you, so you don't have to waste your time on boilerplate HTTP and parsing. And the binary encoding is faster and more compact than text-based encoding like JSON or XML. But this is all convenience and optimization, not fundamentally different on a conceptual level.”

I agree 100% with that statement, and that is exactly what ONC RPC does, and that is exactly what ASN.1 does, and that is exactly what DCS does. That same wheel, again and again. So what I was suggesting originally is that Buf should try to explain what they are doing that these other systems failed to do, and in that explanation acknowledge the reasons this wheel has been re-invented so many times before, and then explain how they think they are going to make a more durable solution that lasts for more than a few years.

No modern RPC system attempts to transparently emulate a local procedure call. Everyone who is seriously using these systems understands that an RPC is not equivalent to a local call. Everyone understands that the network introduces a host of new failure modes that must be considered, as well as latency and concurrency. RPC systems are used to simplify protocol development but it is understood that these are still protocols and they don't magically solve everything.

> that is exactly what ONC RPC does, and that is exactly what ASN.1 does, and that is exactly what DCS does.

Simply put, Protobuf and gRPC do it better. The developer experience is much better. The tooling is much better. The implementation is better-optimized. It's not a new concept, it's just a better implementation. That's all there is to it.

But anyway, Buf is not re-inventing this wheel, it's just building on Protobuf and gRPC. It seems like your beef is with Protobuf and gRPC, not Buf.

I don't disagree with anything you're saying but I believe you're missing the forest for the trees. The problem I see Buf and other companies trying to solve isn't RPC so much as IDL. Defining and managing schemas is incredibly painful. It doesn't matter if it's a remote network call, HTTP, binary blob over a socket, files on disk or a call to a local function. Defining, sharing and consuming the boundaries -- the interfaces -- rigorously is the painful part.

You're too caught up on the implementation details. If you solve the IDL problems, then you can simply change the implementation and no one is the wiser. grpc-gateway is maybe a good example?

> exactly what ASN.1 does

Okay I actually do disagree with this. ASN.1 is a whole different ballgame. What doesn't it do, besides the obvious complications leading to buggy and insecure implementations every other day?

Please don't use ASN.1

> “So what I was suggesting originally is that Buf should try to explain what they are doing that these other systems failed to do…”

I was having a discussion online yesterday about writing research papers, and this exact line of argumentation was noted.

As I recall, the marketing version is called a ‘white paper’.