Hacker News new | ask | show | jobs
by zdw 1657 days ago
So instead of making the applications use a good RPC library, we're going to shove more crap into the kernel? No thanks, from a security context and complexity perspective.

Per https://blog.dave.tf/post/new-kubernetes/ , the way that this was solved in Borg was:

> "Borg solves that complexity by fiat, decreeing that Thou Shalt Use Our Client Libraries For Everything, so there’s an obvious point at which to plug in arbitrarily fancy service discovery and load-balancing. "

Which seems like a better solution, if requiring some reengineering of apps.

14 comments

The complexity is an issue (but sidecars are plenty complex too), but the security not so much. BPF C is incredibly limiting (you can't even have loops if the verifier can't prove to its satisfaction that the loop has a low static bound). It's nothing at all like writing kernel C.
You don't have to use C.

There are two projects that enable writing eBPF with Rust [1][2]. I'm sure there is an equivalent with nicer wrappers for C++.

[1] https://github.com/foniod/redbpf

[2] https://github.com/aya-rs/aya

It doesn't make any difference which language you use; the security promises are coming from the verifier, which is analyzing the CFG of the compiled program. C is what most people use, since the underlying APIs are in C, and since the verifier is so limiting that most high-level constructions are off the table.
Sure, I was not implying that Rust would have any security benefits fir eBPF.

Just that you can even write eBPF code in more convenient languages.

This has come up here a bunch of times (we do a lot of work in Rust). I've been a little skeptical that Rust is a win here, for basically the reason I gave upthread: you can't really do much with Rust in eBPF, because the verifier won't let you; it seems to me like you'd be writing a dialect of Rust-shaped C. But we did a recent work sample challenge for Rust candidates that included an eBPF component, and a couple good submissions used Rust eBPF, so maybe I'm wrong about that.

I'm also biased because I love writing C code (I know, both viscerally and intellectually, that I should virtually never do so; eBPF is the one sane exception!)

I don't think client libraries are the answer. If you only have one technology stack (say, Java and Spring) and only use one application-layer network protocol (say, HTTP), then maybe it's fine.

But once you have more than one language or framework, you need to write more and more of these libraries. And what happens if it's not just HTTP? What if you need to speak Redis, MySQL, or some random binary protocol? Do you write client libraries for those too? Maybe a company like Google has the scale to do this, but most orgs do not. But even then, what if you have to run some vendor-supplied code that you don't even have source for?

I agree with you that shoving more of this into the kernel isn't desirable, but libraries aren't great. Been there, done that, don't feel like doing it again. I'd rather stick with sidecars.

If you are in a position where you can do that then great. Most folks out there are in a position where they need to run arbitrary applications delivered by vendors without an ability to modify them.

The second aspect is that this can get extremely expensive if your applications are written in a wide number of language frameworks. That's obviously different at Google where the number of languages can be restricted and standardized.

But even then, you could also link a TCP library into your app. Why don't you?

I'm not necessarily advocating for the approach described in the article but it wouldn't worry me from a security perspective. The security model of eBPF is pretty impressive. The security issues arising from engineers struggling to keep the entire model in their head would concern me though.
The industry is moving away from the client library approach. This is possible in a place like Google where they force folks to write software in one of four languages (C++, Java, Go, Python) but doesn't scale to a broader ecosystem.
It sure scales, I am yet to work in organisations where everything goes.

There are a set of sanctioned languages and that is about it.

The subtle aspect of the comment you're replying to is that _they write everything_.

Hard to cram a new library into some closed source vendor app.

Depends how it was written and made extensible.
In a world without (D)COM, I find it's much, much harder to make common base libraries and force people to use them, especially if you can't also force limit the set of toolchains used in the environment.
The network is the base library - that is the shift you are seeing. You make a call out to a network address with a specific protocol.

Also, as an aside, I think WebAssembly has the potential to shift this back. In a world where libraries and programs are compiled to WebAssembly, it doesn't matter what their source language was, and as such, the client library based approach might swing back into vogue.

> The network is the base library

you remind me of the 20+ years ago Sun Microsystems assertion "The Network IS the Computer".

citation: https://www.networkcomputing.com/cloud-infrastructure/networ...

Well they did put the dot in dot-com.
WASM isn't a valid target for many languages, that's one thing.

Two, the case is about the library to interact with the network, so... There's also implementing the protocols.

In addition to whether or not all of your various dev teams preferred languages have a supported client SDK, you also have the build vs. buy issue if you're plugging COTS applications into your service mesh, there is no way to force a third party vendor to reengineer their application specifically for you.

This probably dictates a lot of Google's famous "not invented here" behavior, but most organizations can't afford to just write their entire toolchain from scratch and need to use applications developed by third parties.

It is the technically better solution IMO/IME, too.

But that doesn't work when you're trying to sell enterprises the idea of 'just move your workloads to Kubernetes!'. :)

> a good RPC library

I like that approach. If you use client libraries, new RPC mechanisms are "free" to implement (until you need to troubleshoot upgrades). It's also an argument against statically linking.

For instance, if running services on the same machine, io-uring can probably be used? (I'm a noob at this). eBPF for packet switching/forwarding between different hosts, etc.

This may no longer be the case, but back at Google I remember one day having my java library no longer using the client library logger, but spawning some other app and talking (sending logs to it). That other app used to be fat-client, linked in our app, supported by another team. First I was wtf.. Then it hit me - this other team can update their "logging" binary at different cycle than us (hence we don't have to be on the same "build" cycle). All they needed to do for us is provide with very "thin" and rarelly changing interface library. And they can write it in any language they like (Java, c++, go, rust, etc.)

Also no need to be .so/ (or .dll/.dylib) - just some quick IPC to send messages around. Actually can be better. For one, if their app is still buffering messages, my app can exit, while theirs still run. Or security reasons (or not having to think about these), etc. etc. So still statically linked but processes talking to each other. (Granted does not always work for some special apps, like audio/video plugins, but I think works fine for the case above).

It does feel a bit like we're trying to monkey patch compiled code but the benefits are pretty clear.
I would argue pretty strenuously that this is not what is being done.

The sockets layer is becoming a facade which can guarantee additional things to applications which are compiled against it, and you've got dependency injection here so that the application layer can be written agnostically and not care about any of those concerns at all.

Well ok, but the dependency inject is not statically checked in this case, its changed dynamically, perhaps while the application is running. Is that not similar to a monkey patch?
That’s great if you write all your software. If you want to use someone else’s thing then you have to wrap it in that magic everywhere client.
What if a client library does not yet exist for your language?
In a large orga, you limit the languages available for projects to well supported ones internally, ie. to those that are known to have a port of the RPC/metrics/status/discovery library. Also makes it easier to have everything under a single build system, under a single set of code styles, etc.

If some developers want to use some new language, they have to first in put in the effort by a) demonstrating the business case of using a new language and allocating resources to integrate it into the ecosystem b) porting all the shared codebase to that new language.

Absolutely. I was thinking what if there's a good business reason to use a different language that's not the norm for your org. Then you're stuck with an infra problem preventing you from using the right tool for the job.

Of course, this is the exception to the rule you described well :)

I don't think of it as an infra problem, but as an early manifestation of effort that would arise later on, anyway: long-term maintenance of that new language. You need people who know the language to integrate it well with the rest of the codebase, people who can perform maintenance on language-related tasks, people who can train other people on this language, ... These are all problems you'd have later on, but are usually handwaved away as trivial.

Throughout my career nearly every single company I've worked in had That One Codebase written by That One Brilliant Programmer in That One Weird Language that no-one maintains because the original author since left, the language turns out to be dead and because it's extremely expensive to hire or train more people to grok that language just for this project.

There are only 5 languages. JavaScript, C++, Java, Python, C#

This is basically the same set of languages people were writing 20 years ago and will probably be the same set of languages people will write in 20 years from now.

It really depends on your domain. I haven't seen C# a lot, nor python, in some orgs.

For some (like me), it's more a superset of C, assembly, bash, maybe lisp, python and matlab.

For others, it's going to be JavaScript, PHP, CSS, HTML..

I agree though that a library is usually domain-specific, and that you can probably easily identify the subset of languages that you really need official bindings for (thereby making my comment a bit useless, sorry for the noise).

The big secret is that sidecars can only help so much. If you want distributed tracing, the service mesh can't propagate traces into your application (so if service A calls service B which calls service C, you'll never see that end to end with a mesh of sidecars). mTLS is similar; it's great to encrypt your internal traffic on the wire, but that needs to get propagated up to the application to make internal authorization decisions. (I suppose in some sense I like to make sure that "kubectl port-forward" doesn't have magical enhanced privileges, which it does if your app is oblivious to the mTLS going on in the background. You could disable that specifically in your k8s setup, but generally security through remembering to disable default features seems like a losing battle to me. Easier to have the app say "yeah you need a key". Just make sure you build the feature to let oncall get a key, or they will be very sad.)

For that reason, I really do think that this is a temporary hack while client libraries are brought up to speed in popular languages. It is really easy to sell stuff with "just add another component to your house of cards to get feature X", but eventually it's all too much and you'll have to just edit your code.

I personally don't use service meshes. I have played with Istio but the code is legitimately awful, so the anecdotes of "I've never seen it work" make perfect sense to me. I have, in fact, never seen it work. (Read the xDS spec, then read Istio's implementation. Errors? Just throw them away! That's the core goal of the project, it seems. I wrote my own xDS implementation that ... handles errors and NACKs correctly. Wow, such an engineering marvel and so difficult...)

I do stick Envoy in front of things when it seems appropriate. For example, I'll put Envoy in front of a split frontend/backend application to provide one endpoint that serves both the frontend or backend. That way production is identical to your local development environment, avoiding surprises at the worst possible time. I also put it in front of applications that I don't feel like editing and rebuilding to get metrics and traces.

The one feature that I've been missing from service meshes, Kubernetes networking plugins, etc. is the ability to make all traffic leave the cluster through a single set of services, who can see the cleartext of TLS transactions. (I looked at Istio specifically, because it does have EgressGateways, but it's implemented at the TCP level and not the HTTP level. So you don't see outgoing URLs, just outgoing IP addresses. And if someone is exfiltrating data, you can't log that.) My biggest concern with running things in production is not so much internal security, though that is a big concern, but rather "is my cluster abusing someone else". That's the sort of thing that gets your cloud account shut down without appeal, and I feel like I don't have good tooling to stop that right now.

> If you want distributed tracing, the service mesh can't propagate traces into your application (so if service A calls service B which calls service C, you'll never see that end to end with a mesh of sidecars)

Why not? AFAIK traces are sent from the instrumented app to some tracing backend, and a trace-id is carried over via an HTTP header from the entry point of the request until the last service that takes part in that request. Why a sidecar/mesh would break this?

I think the point is that the service mesh can't do the work of propagation. It needs the client to grab the input header, and attach it to any outbound requests. From the perspective of the service mesh, the service is handling X requests, and Y requests are being sent outbound. It doesn't know how each outbound request maps to an input.

So now all of the sudden we do need a client library for each service in order to make sure the header is being propagated correctly.

But tracing cannot be done anyway with a sidecar and no modification to the service code anyway. With a sidecar (or eBPF) you will get blackbox metrics for free (connections throughput, latency, errors etc) but tracing it needs to be done inside the code (even automatically by some third-party library/addon or instrumenting manually). I understand the point that, once you are there instrumenting for tracing, you can also instrument for metrics and not use a sidecar. But to be fair distributed tracing is something that's catching on only now and metrics give you already some kind of visibility that it's better to have that not to have. On top of that you can add tracing and improve the observability.
You described the problem correctly.

I think it's a stretch to say that requires a client library, though. It should be straightforward to have whatever library you are already using for http requests pass those headers through.

https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overv...

> a trace-id is carried over via an HTTP header from the entry point of the request until the last service that takes part in that request.

But it's not, unless you specifically code your services to do that. Which isn't hard, but just means plugging an unmodified service into a service mesh isn't enough.

This. Header trace propagation is a godsend.
I'm sure someone will write leftPad in eBPF any day now.
Indeed. We could even embed a WASM runtime (headless v8?) so one can execute arbitrary JavaScript in-kernel… wait :)
eBPF is far too limited to run a WASM runtime. That's why the proposed article approach is even possible.