| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jzelinskie 1846 days ago
	I hadn't realized that Gogo was in such a bad spot with the upstream Go protobuf changes. There was lots of drama when the changes were made and I guess that overshadowed any optics I had on Gogo. Making vtprotobuf an additional protoc plugin seems like the Right Thing™, although it's a shame how complicated protoc commands end up becoming for mature projects. I'm pretty tempted to port Authzed over to this and run some benchmarks -- our entire service requires e2e latency under 20ms, so every little bit counts. The biggest performance win is likely just having an unintrusive interface for pooling allocated protos.

2 comments

jeffbee 1846 days ago

Proto message unmarshal in Go for a small message should be 5 orders of magnitude below 20ms, shouldn't even begin to matter until you are sweating individual microseconds.

link

AYBABTME 1846 days ago

That's true if your program only does a single unmarshal at a time at a leisurely pace. And in a steady state situation, the memory trashing left behind each individual unmarshal call needs to be paid up by some poor future request.

I agree it's unlikely the difference here will be solely responsible for tipping the GP's request above 20ms, but the memory problems could reasonably ruin tail latencies.

link

lttlrck 1846 days ago

The significance of 20ms isn't clear so this is hard to judge.

Perhaps they have significant external (network) latency leaving only a few ms budget for the application stack - so they could easily be up against a wall.

link

morelisp 1846 days ago

Until the GC kicks in and steals a full 200usec + a bunch of your throughput...

(Holy shit, who is downvoting this? It's literally the whole article!)

link

throwaway894345 1846 days ago

If your path is sensitive to 200us of latency you should probably optimize your application and tune your GC. Typically 200us for freeing all unreachable memory is not a big deal.

link

jcelerier 1846 days ago

> If your path is sensitive to 200us of latency you should probably optimize your application and tune your GC.

okay, you've done this, three years later and it's the same thing again since you need to accomodate the new features. your users haven't upgraded their computers. what do you do ?

link

pjmlp 1846 days ago

Run a profiler and optimize again.

link

jcelerier 1846 days ago

Your original code is already optimized as much as is possible outside of the things mentioned by OP

link

harikb 1846 days ago

Properly written Go code (or even Java for that matter) will try to minimize allocations. For Java, unless I am mistaken pause-less GC is only offered by Azul - $$

link

RhodesianHunter 1846 days ago

>or even Java

Just in case you may be unaware, the latest GCs for Java (Shenandoah, ZGC) are miles ahead of anything available for Go due to sheer age and manpower. Parallel and Pauseless are easily achievable in most cases.

link

geodel 1846 days ago

> Latest GCs for Java (Shenandoah, ZGC) are miles ahead of anything available.

Beyond hyperbole, do you have any actual comparison of Go vs Java GC performance?

link

morelisp 1846 days ago

Java's GC is better but Go's GC is also parallel and "pauseless" - iirc ZGC is 50-500usec which is comparable to Go's target 200usec.

The point is, neither is "five orders of magnitude" below 20ms. And neither needs zero CPU even if it doesn't block other threads.

link

morelisp 1846 days ago

Yeah, the whole point of the article is that gRPC v2 (and frankly v1 for that matter) are not “properly written” to do this.

link

brandmeyer 1846 days ago

3% regression in QPS, 20% regression in CPU, and 5% regression in memory usage according to the article. Those are considerably worse than "5 orders of magnitude below".

link

harikb 1846 days ago

GP meant 5 orders of magnitude below "20 ms". 20 ms is a lot of time.

There is nothing one can do to a, say, a 1 kilo byte buffer that will cross 1 ms in any language. My own Go code doesn't cross more than few micros per message.

link

brandmeyer 1846 days ago

GP's root claim is that protobuf serialization/deserialization performance shouldn't matter, on an article where a user is specifically demonstrating that it does matter.

link

joshuamorton 1846 days ago

The usecase described in the article, and the usecase described in the top post in this thread aren't the same usecase. If you aren't throughput bound, a 5% regression in parse speed doesn't matter if your goal is to stay under 20ms and parsing takes 17 us. Sure it now takes 19 us, which is a regression of 2 us out of 20ms, or 1/10000th of your time.

link

rapsey 1846 days ago

> our entire service requires e2e latency under 20ms

Why are you using Go then?

link

kodah 1846 days ago

20ms is a pretty considerable amount of time WRT E2E transaction time in today's world. Can you expand on your concerns with Go?

link

mcronce 1846 days ago

It's not really suitable for latency-critical applications.

EDIT: Fixed unfortunate typo

link

fcantournet 1846 days ago

You can 100% write services with P999 < 20ms in go. Not even trying that hard. Go is entirely suitable for this kind of constraints, I dare say that's go's main target.

P99 < 1ms, that's when you're going to want to switch it up.

link

sagichmal 1846 days ago

Depending on workload, Go also does sub-1ms p99 pretty easily. I'm getting sub-1ms p99.9.

link

torbital 1846 days ago

What are the proposed solutions to get better than that? C/Rust code? Assembly?

link

somethingwitty1 1846 days ago

was the double-negative intentional? I've used Go for sub-millisecond needs. So 20ms seems like it would be a reasonable choice from where I'm sitting.

link

mcronce 1846 days ago

It was not intentional, thanks for asking...very unfortunate typo ;)

Go doesn't give you control over inline vs indirect allocation, instead relying on escape analysis, which is notoriously finicky. Seemingly unrelated changes, along with compiler upgrades, can ruin your carefully optimized code.

This is especially heinous because it uses a GC; unnecessary allocations have a disproportionately large impact on your application performance. One or the other wouldn't be nearly as bad.

Time and time again we see reports from organizations/projects with perfectly fine average latency, but horrendous p95+ times, when written in Go - some going as far as to do straight-up insane optimizations (see Dragph) or rewrite in other languages.

link

ctvo 1846 days ago

But you think this impacts a 20ms budget? It’s mostly trivia to get sub 20ms p99 in Go.

link

pjmlp 1846 days ago

While escape analysis in Go is finky, you can make it part of the CI/CD to keep it under control.

https://medium.com/a-journey-with-go/go-introduction-to-the-...

No different than running other kinds of static analysis for well known languages, unsafe by default.

link

Thaxll 1846 days ago

I don't know, I'm able to get 150k grpc q/sec with p99 sub 1ms. It's def better than G1 and CMS.

link