Proto message unmarshal in Go for a small message should be 5 orders of magnitude below 20ms, shouldn't even begin to matter until you are sweating individual microseconds.
That's true if your program only does a single unmarshal at a time at a leisurely pace. And in a steady state situation, the memory trashing left behind each individual unmarshal call needs to be paid up by some poor future request.
I agree it's unlikely the difference here will be solely responsible for tipping the GP's request above 20ms, but the memory problems could reasonably ruin tail latencies.
The significance of 20ms isn't clear so this is hard to judge.
Perhaps they have significant external (network) latency leaving only a few ms budget for the application stack - so they could easily be up against a wall.
If your path is sensitive to 200us of latency you should probably optimize your application and tune your GC. Typically 200us for freeing all unreachable memory is not a big deal.
> If your path is sensitive to 200us of latency you should probably optimize your application and tune your GC.
okay, you've done this, three years later and it's the same thing again since you need to accomodate the new features. your users haven't upgraded their computers. what do you do ?
Properly written Go code (or even Java for that matter) will try to minimize allocations. For Java, unless I am mistaken pause-less GC is only offered by Azul - $$
Just in case you may be unaware, the latest GCs for Java (Shenandoah, ZGC) are miles ahead of anything available for Go due to sheer age and manpower. Parallel and Pauseless are easily achievable in most cases.
3% regression in QPS, 20% regression in CPU, and 5% regression in memory usage according to the article. Those are considerably worse than "5 orders of magnitude below".
GP meant 5 orders of magnitude below "20 ms". 20 ms is a lot of time.
There is nothing one can do to a, say, a 1 kilo byte buffer that will cross 1 ms in any language. My own Go code doesn't cross more than few micros per message.
GP's root claim is that protobuf serialization/deserialization performance shouldn't matter, on an article where a user is specifically demonstrating that it does matter.
The usecase described in the article, and the usecase described in the top post in this thread aren't the same usecase. If you aren't throughput bound, a 5% regression in parse speed doesn't matter if your goal is to stay under 20ms and parsing takes 17 us. Sure it now takes 19 us, which is a regression of 2 us out of 20ms, or 1/10000th of your time.
I agree it's unlikely the difference here will be solely responsible for tipping the GP's request above 20ms, but the memory problems could reasonably ruin tail latencies.