Hacker News new | ask | show | jobs
by cmeiklejohn 979 days ago
I'm also not sure what you mean by the context doesn't propagate between containers and/or pods -- GRPC isn't aware of these Docker/Kubernetes aspects at all.

Do you actually mean that unless explicitly propagated to a subsequent downstream RPC the data is dropped? If so, that's by design.

However, most large-scale organizations that are doing distributed tracing (e.g., Twitter, Uber) have either invented, reproduced, or leveraged OpenTelemetry's design for this precise thing.

Naive context propagation isn't (really) the difficult part with most of these designs -- it's what you've done, using an interceptor, reading the data and assigning it automatically on subsequent requests -- the challenge is dealing with this under many different, real world conditions: a.) concurrency and thread scheduling; b.) not all services use the same version of downstream RPC libraries; c.) not all calls are GRPC, and some use HTTP (and, different HTTP libraries, at that.); and d.) you cross message passing boundaries: i.e., I receive request, write to Kafka queue/reliable workflow backend (e.g., Cadence, Temporal) and re-read the request and then execute a subsequent RPC as a result of that message.

If you're using Kotlin, I suspect you will run into these challenges. Tune your thread pools up/down, restrict your JVM's resources, and you'll suddenly see that if the thing that handles the request uses different threads/coroutines/etc. then the code block that issues the downstream RPC, you'll start dropping the context without explicit handling of that case.

In fact, a very simple test case in Java where you use several, concurrently executed CompleteableFuture's that each issue RPCs, in a very small thread pool should be enough to see the issue.

2 comments

Thanks for the comment, it seems like you know a lot more about this than I do.

This was a solution that has worked well for my company that averages <1 req/s, so yes I have not tested it under more extreme conditions. This is version 1.0.0 so it is quite new and naive by design. I was posting here to get some feedback on the initial version and see how I can improve it, which you have given me!

Feel free to contribute to the project! It seems like your expertise applies nicely!

It's too bad that there is still no JEP for an official context in Java.

Disclaimer: I wrote the OTel one, and am sad to see yet more context implementations being made, including the OTel one, these really need to all be on the way out.