| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jeffbee 1847 days ago
	Proto message unmarshal in Go for a small message should be 5 orders of magnitude below 20ms, shouldn't even begin to matter until you are sweating individual microseconds.

4 comments

AYBABTME 1846 days ago

That's true if your program only does a single unmarshal at a time at a leisurely pace. And in a steady state situation, the memory trashing left behind each individual unmarshal call needs to be paid up by some poor future request.

I agree it's unlikely the difference here will be solely responsible for tipping the GP's request above 20ms, but the memory problems could reasonably ruin tail latencies.

lttlrck 1846 days ago

The significance of 20ms isn't clear so this is hard to judge.

Perhaps they have significant external (network) latency leaving only a few ms budget for the application stack - so they could easily be up against a wall.

morelisp 1847 days ago

Until the GC kicks in and steals a full 200usec + a bunch of your throughput...

(Holy shit, who is downvoting this? It's literally the whole article!)

throwaway894345 1846 days ago

If your path is sensitive to 200us of latency you should probably optimize your application and tune your GC. Typically 200us for freeing all unreachable memory is not a big deal.

jcelerier 1846 days ago

> If your path is sensitive to 200us of latency you should probably optimize your application and tune your GC.

okay, you've done this, three years later and it's the same thing again since you need to accomodate the new features. your users haven't upgraded their computers. what do you do ?

pjmlp 1846 days ago

Run a profiler and optimize again.

jcelerier 1846 days ago

Your original code is already optimized as much as is possible outside of the things mentioned by OP

pjmlp 1846 days ago

Don't guess, measure.

Then just like C, writing a tiny set of functions in Assembly is always an option.

harikb 1847 days ago

Properly written Go code (or even Java for that matter) will try to minimize allocations. For Java, unless I am mistaken pause-less GC is only offered by Azul - $$

RhodesianHunter 1847 days ago

>or even Java

Just in case you may be unaware, the latest GCs for Java (Shenandoah, ZGC) are miles ahead of anything available for Go due to sheer age and manpower. Parallel and Pauseless are easily achievable in most cases.

geodel 1846 days ago

> Latest GCs for Java (Shenandoah, ZGC) are miles ahead of anything available.

Beyond hyperbole, do you have any actual comparison of Go vs Java GC performance?

morelisp 1846 days ago

Java's GC is better but Go's GC is also parallel and "pauseless" - iirc ZGC is 50-500usec which is comparable to Go's target 200usec.

The point is, neither is "five orders of magnitude" below 20ms. And neither needs zero CPU even if it doesn't block other threads.

morelisp 1847 days ago

Yeah, the whole point of the article is that gRPC v2 (and frankly v1 for that matter) are not “properly written” to do this.

brandmeyer 1847 days ago

3% regression in QPS, 20% regression in CPU, and 5% regression in memory usage according to the article. Those are considerably worse than "5 orders of magnitude below".

harikb 1847 days ago

GP meant 5 orders of magnitude below "20 ms". 20 ms is a lot of time.

There is nothing one can do to a, say, a 1 kilo byte buffer that will cross 1 ms in any language. My own Go code doesn't cross more than few micros per message.

brandmeyer 1846 days ago

GP's root claim is that protobuf serialization/deserialization performance shouldn't matter, on an article where a user is specifically demonstrating that it does matter.

joshuamorton 1846 days ago

The usecase described in the article, and the usecase described in the top post in this thread aren't the same usecase. If you aren't throughput bound, a 5% regression in parse speed doesn't matter if your goal is to stay under 20ms and parsing takes 17 us. Sure it now takes 19 us, which is a regression of 2 us out of 20ms, or 1/10000th of your time.