Hacker News new | ask | show | jobs
by jrockway 1657 days ago
The big secret is that sidecars can only help so much. If you want distributed tracing, the service mesh can't propagate traces into your application (so if service A calls service B which calls service C, you'll never see that end to end with a mesh of sidecars). mTLS is similar; it's great to encrypt your internal traffic on the wire, but that needs to get propagated up to the application to make internal authorization decisions. (I suppose in some sense I like to make sure that "kubectl port-forward" doesn't have magical enhanced privileges, which it does if your app is oblivious to the mTLS going on in the background. You could disable that specifically in your k8s setup, but generally security through remembering to disable default features seems like a losing battle to me. Easier to have the app say "yeah you need a key". Just make sure you build the feature to let oncall get a key, or they will be very sad.)

For that reason, I really do think that this is a temporary hack while client libraries are brought up to speed in popular languages. It is really easy to sell stuff with "just add another component to your house of cards to get feature X", but eventually it's all too much and you'll have to just edit your code.

I personally don't use service meshes. I have played with Istio but the code is legitimately awful, so the anecdotes of "I've never seen it work" make perfect sense to me. I have, in fact, never seen it work. (Read the xDS spec, then read Istio's implementation. Errors? Just throw them away! That's the core goal of the project, it seems. I wrote my own xDS implementation that ... handles errors and NACKs correctly. Wow, such an engineering marvel and so difficult...)

I do stick Envoy in front of things when it seems appropriate. For example, I'll put Envoy in front of a split frontend/backend application to provide one endpoint that serves both the frontend or backend. That way production is identical to your local development environment, avoiding surprises at the worst possible time. I also put it in front of applications that I don't feel like editing and rebuilding to get metrics and traces.

The one feature that I've been missing from service meshes, Kubernetes networking plugins, etc. is the ability to make all traffic leave the cluster through a single set of services, who can see the cleartext of TLS transactions. (I looked at Istio specifically, because it does have EgressGateways, but it's implemented at the TCP level and not the HTTP level. So you don't see outgoing URLs, just outgoing IP addresses. And if someone is exfiltrating data, you can't log that.) My biggest concern with running things in production is not so much internal security, though that is a big concern, but rather "is my cluster abusing someone else". That's the sort of thing that gets your cloud account shut down without appeal, and I feel like I don't have good tooling to stop that right now.

1 comments

> If you want distributed tracing, the service mesh can't propagate traces into your application (so if service A calls service B which calls service C, you'll never see that end to end with a mesh of sidecars)

Why not? AFAIK traces are sent from the instrumented app to some tracing backend, and a trace-id is carried over via an HTTP header from the entry point of the request until the last service that takes part in that request. Why a sidecar/mesh would break this?

I think the point is that the service mesh can't do the work of propagation. It needs the client to grab the input header, and attach it to any outbound requests. From the perspective of the service mesh, the service is handling X requests, and Y requests are being sent outbound. It doesn't know how each outbound request maps to an input.

So now all of the sudden we do need a client library for each service in order to make sure the header is being propagated correctly.

But tracing cannot be done anyway with a sidecar and no modification to the service code anyway. With a sidecar (or eBPF) you will get blackbox metrics for free (connections throughput, latency, errors etc) but tracing it needs to be done inside the code (even automatically by some third-party library/addon or instrumenting manually). I understand the point that, once you are there instrumenting for tracing, you can also instrument for metrics and not use a sidecar. But to be fair distributed tracing is something that's catching on only now and metrics give you already some kind of visibility that it's better to have that not to have. On top of that you can add tracing and improve the observability.
You described the problem correctly.

I think it's a stretch to say that requires a client library, though. It should be straightforward to have whatever library you are already using for http requests pass those headers through.

https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overv...

> a trace-id is carried over via an HTTP header from the entry point of the request until the last service that takes part in that request.

But it's not, unless you specifically code your services to do that. Which isn't hard, but just means plugging an unmodified service into a service mesh isn't enough.

This. Header trace propagation is a godsend.