Hacker News new | ask | show | jobs
by gen220 2099 days ago
FWIW, I think that gRPC/protobufs have pretty compelling answers to each of the historically-valid complaints you've listed here.

- cpu cycle overhead: this is valid if the overhead is very high or very important. otherwise, most companies would love to trade off cpu cycles for dev productivity.

- refactorings/renamings without deployment staggering. protobufs were specifically designed with this in mind, insofar as they support deprecating fields and whatnot. However, writing a deprecatable-API is a skill, even with protos. If you have many clients and want to redo everything by scratch, you will have problems.

- "testing locally" (which I take to mean integration testing locally) is the only one that requires some imagination to solve, assuming all your traffic is guarded by short-term-lease certs issued by vault or something similar. But even this is quite achievable.

- error stack traces included for free: may I introduce you to context.abort(). It's not a stack trace by default, but you can actually wrap the stack trace into the message if you so-care to. opentracing isn't quite free, in a performance sense, but in a required-eng-time-to-setup-and-maintain-sense, it is pretty cheap.

- dependency on secops to make network changes: I've never encountered this, but I bet you that a good platform team can provide a system where application teams effectively don't need to worry about this. It's impossible to overcome this challenge in an existing company that's used to doing things this way, though.

2 comments

> cpu cycle overhead

The original poster's point was CPU and network overhead. A local procedure/function call or message-send takes on the order of one or up to a few nanoseconds. Depending on how you organize things, an IPC is going to be in the microsecond or even millisecond range. That's a lot of orders of magnitude. It's also latency that you just aren't going to get back, no matter what extra resources you throw at it. [1][2]

In the early naughties, a rewrite of very SOA/microservice-y BBC backend system I re-architected as a monolith became around 1000x faster. [3]

In addition, in-process calls are essentially 100% reliable. Network calls, and various processes attached to them, not so much (see [1], again). The BBC system not just became a lot faster, it also became roughly 100 times more reliable, and that's probably low-balling it a bit. It essentially didn't fail for internal reasons after we learned about Java VM parameters. And it was less code, did more, and was easier to develop for.

[1] https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...

[2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115...

[3] https://link.springer.com/chapter/10.1007%2F978-1-4614-9299-...

Ah gotcha, thank you for locking-in on the issue. You're absolutely right that network hops introduce overhead (I was intending to wrap i/o blocking on network calls under the banner of cpu cycles, adjacent to serialization)

Like any other design decision, there's a trade-off here. (see my other comments in this tree, about how many 9's in reliability/latency you're targeting).

If you're working in an environment where sub-5ms latency to the 4th or 5th 9 is critical, inter-machine communication is not for your application, period.

Reliability, as an orthogonal concern, is one that has improved incredibly since the early aughts. The "transport" and error-handling layer of open-source RPC frameworks has improved by orders of magnitude. I'd recommend taking a long look at the experiences of companies built on gRPC.

It's much easier to build a reliable SOA-esque system today than it was even 5 years ago. It's been an area of rapid progress.

Yes, obviously these are trade-offs.

However, I find the way you framed these trade-offs decidedly...odd, in terms of "who needs that kind of super-high performance and reliability????", as if achieving that were only possible through herculean effort that just isn't worth it for most applications.

The fact of the matter is that a local message-send is also a helluva lot easier than any kind of IPC. Also easier to deploy, as it comes in the same binary so is already there and easier to monitor (no monitoring needed).

So the trade-off is more appropriately framed as follows: why on earth would you want to spend significant extra effort in coding, deployment and monitoring, for the dubious "benefit" of frittering away 3-6 orders of magnitude of performance and perfect reliability?

Of course there can be benefits that outweigh all these negatives in effort and performance/reliability, but those benefits have to be pretty amazing to be worth it.

> as if achieving that were only possible through herculean effort

I encourage you to reread my comments, I'm not suggesting anywhere that high-performance requires exceptional effort.

In fact, I'm actively admitting that for applications where high-performance is required, IPCs/RPCs are not an option.

> just isn't worth it for most applications

Performance is valuable, but it's one dimension of value.

My premise is that, given the maturity of RPC frameworks and network tooling in 2020, most already-networked applications can afford to trade the performance hit of additional hops on the backend.

Whether what you get in exchange for that performance hit is valuable?

That is mostly a function of the quality of your eng platform.

> a local message-send is also a helluva lot easier [on the programmer?] than any kind of IPC

This strongly depends on your engineering org, although it seems like this is the point that's hardest to imagine for some people.

If you're on a team that depends on the availability of data maintained by N other teams,

(given the maturity of RPC Frameworks and network tooling in 2020, again)

It is much easier to apply SLOs and SLAs to an interface that's gated by an RPC service.

> spend significant extra effort in coding, deployment and monitoring

The extra effort here is made completely negligible by the existence of a decent platform team.

FWIW, I wouldn't be able to imagine it if I haven't experienced it myself.

> benefits have to be pretty amazing to be worth it

I still think you're overestimating some of the costs (see above). FWIW, I've worked in an RPC-oriented environment for years now, and reliability has never been a concern. Our platform team is pretty good, but we are not a Google-esque company (200 engineers, including eng managers)

The performance trade-off has been demonstrably worthwhile, because we've used it to purchase a degree of team independence that would not have been otherwise possible.

>In fact, I'm actively admitting that for applications where high-performance is required, IPCs/RPCs are not an option.

But you're framing it as "...for applications where high-performance is required", as if taking the performance, expressiveness and reliability hits should obviously be the default, unless you have very special circumstances.

My point is, and continues to be, that it's the other way around: you should go for simplicity, reliability and performance unless you have, and can demonstrate you have, very special requirements.

Thrift or protobuf is a huge step up from the alternatives, but you still have a lot of overhead. Generics are limited and you're essentially forced to "defunctionalise the continuation" everywhere: any time you want to pass a callback around you have to turn it into a command object instead.
I don't disagree with you, this actually sounds like the beginning of a super interesting conversation.

Can you share some examples of the generics problem and "defunctionalizing the continuation"?

Does google's `any` package help with the generics problem you describe? (Acknowledging that it's obviously clunky)

> Can you share some examples of the generics problem and "defunctionalizing the continuation"?

Well, the generics problem is that you don't have generics. So you just can't define a lot of general-purpose functions in gRPC, and have to make a specific version of them instead. Even something like "query for objects like this and then apply this transform to the results" just can't be done, because there's no way to pass the transformation over the wire, so you have to come up with a datastructure to represent all the transformations that you want to do instead. "Defunctionalizing the continuation" is the technique for doing that, https://www.cis.upenn.edu/~plclub/blog/2020-05-15-Defunction... is an example, but it's a manual process that requires creative effort each time.

> Does google's `any` package help with the generics problem you describe? (Acknowledging that it's obviously clunky)

Not really, because you don't have the type information at compile time. Erased generics are fine in a well-typed language, but just using an any type you can't even do something like: a function that takes two values of the same type.

People who are downvoting the parent comment: I’d love to know why? I won’t claim expertise here, but it doesn’t strike me as clearly incorrect.