Hacker News new | ask | show | jobs
by jeffbee 1847 days ago
Proto message unmarshal in Go for a small message should be 5 orders of magnitude below 20ms, shouldn't even begin to matter until you are sweating individual microseconds.
4 comments

That's true if your program only does a single unmarshal at a time at a leisurely pace. And in a steady state situation, the memory trashing left behind each individual unmarshal call needs to be paid up by some poor future request.

I agree it's unlikely the difference here will be solely responsible for tipping the GP's request above 20ms, but the memory problems could reasonably ruin tail latencies.

The significance of 20ms isn't clear so this is hard to judge.

Perhaps they have significant external (network) latency leaving only a few ms budget for the application stack - so they could easily be up against a wall.

Until the GC kicks in and steals a full 200usec + a bunch of your throughput...

(Holy shit, who is downvoting this? It's literally the whole article!)

If your path is sensitive to 200us of latency you should probably optimize your application and tune your GC. Typically 200us for freeing all unreachable memory is not a big deal.
> If your path is sensitive to 200us of latency you should probably optimize your application and tune your GC.

okay, you've done this, three years later and it's the same thing again since you need to accomodate the new features. your users haven't upgraded their computers. what do you do ?

Run a profiler and optimize again.
Your original code is already optimized as much as is possible outside of the things mentioned by OP
Don't guess, measure.

Then just like C, writing a tiny set of functions in Assembly is always an option.

Properly written Go code (or even Java for that matter) will try to minimize allocations. For Java, unless I am mistaken pause-less GC is only offered by Azul - $$
>or even Java

Just in case you may be unaware, the latest GCs for Java (Shenandoah, ZGC) are miles ahead of anything available for Go due to sheer age and manpower. Parallel and Pauseless are easily achievable in most cases.

> Latest GCs for Java (Shenandoah, ZGC) are miles ahead of anything available.

Beyond hyperbole, do you have any actual comparison of Go vs Java GC performance?

Java's GC is better but Go's GC is also parallel and "pauseless" - iirc ZGC is 50-500usec which is comparable to Go's target 200usec.

The point is, neither is "five orders of magnitude" below 20ms. And neither needs zero CPU even if it doesn't block other threads.

Yeah, the whole point of the article is that gRPC v2 (and frankly v1 for that matter) are not “properly written” to do this.
3% regression in QPS, 20% regression in CPU, and 5% regression in memory usage according to the article. Those are considerably worse than "5 orders of magnitude below".
GP meant 5 orders of magnitude below "20 ms". 20 ms is a lot of time.

There is nothing one can do to a, say, a 1 kilo byte buffer that will cross 1 ms in any language. My own Go code doesn't cross more than few micros per message.

GP's root claim is that protobuf serialization/deserialization performance shouldn't matter, on an article where a user is specifically demonstrating that it does matter.
The usecase described in the article, and the usecase described in the top post in this thread aren't the same usecase. If you aren't throughput bound, a 5% regression in parse speed doesn't matter if your goal is to stay under 20ms and parsing takes 17 us. Sure it now takes 19 us, which is a regression of 2 us out of 20ms, or 1/10000th of your time.