| FWIW, I think that gRPC/protobufs have pretty compelling answers to each of the historically-valid complaints you've listed here. - cpu cycle overhead: this is valid if the overhead is very high or very important. otherwise, most companies would love to trade off cpu cycles for dev productivity. - refactorings/renamings without deployment staggering. protobufs were specifically designed with this in mind, insofar as they support deprecating fields and whatnot. However, writing a deprecatable-API is a skill, even with protos. If you have many clients and want to redo everything by scratch, you will have problems. - "testing locally" (which I take to mean integration testing locally) is the only one that requires some imagination to solve, assuming all your traffic is guarded by short-term-lease certs issued by vault or something similar. But even this is quite achievable. - error stack traces included for free: may I introduce you to context.abort(). It's not a stack trace by default, but you can actually wrap the stack trace into the message if you so-care to. opentracing isn't quite free, in a performance sense, but in a required-eng-time-to-setup-and-maintain-sense, it is pretty cheap. - dependency on secops to make network changes: I've never encountered this, but I bet you that a good platform team can provide a system where application teams effectively don't need to worry about this. It's impossible to overcome this challenge in an existing company that's used to doing things this way, though. |
The original poster's point was CPU and network overhead. A local procedure/function call or message-send takes on the order of one or up to a few nanoseconds. Depending on how you organize things, an IPC is going to be in the microsecond or even millisecond range. That's a lot of orders of magnitude. It's also latency that you just aren't going to get back, no matter what extra resources you throw at it. [1][2]
In the early naughties, a rewrite of very SOA/microservice-y BBC backend system I re-architected as a monolith became around 1000x faster. [3]
In addition, in-process calls are essentially 100% reliable. Network calls, and various processes attached to them, not so much (see [1], again). The BBC system not just became a lot faster, it also became roughly 100 times more reliable, and that's probably low-balling it a bit. It essentially didn't fail for internal reasons after we learned about Java VM parameters. And it was less code, did more, and was easier to develop for.
[1] https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...
[2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115...
[3] https://link.springer.com/chapter/10.1007%2F978-1-4614-9299-...