Hacker News new | ask | show | jobs
by jerf 3858 days ago
Infrequently enough that it's usually premature optimization to spend too much time worrying about it. But in Go's core application space, high-load network applications, frequently enough that it's something you should be aware of, and there's a realistic chance of optimizing everything else to the point that GC time becomes your biggest problem, even after the latest work on it.

You'd still want to reach for mmm only after A: verifying that GC time really is your biggest problem via profiling and B: taking more normal steps to minimize overallocation first, because with value types Go has some tools (if not necessarily "a lot" of such tools, but definitely some) for dealing with that. But if you end up backed against the wall, this may be helpful.

You could also arguably add a C: did you really mean to use a language with manual memory in the first place, or perhaps Rust? Or can you factor just the relevant bit out into such a language and interact via some RPC mechanism back to the Go code base? But as the situation becomes arbitrarily complicated there simply ceases to be a silver bullet.

2 comments

I couldn't have said it better. This is almost word for word why I decided to build mmm: I maintain a few high-load RPC services in Go, each of which stores millions, if not tens of millions of items in their cache.

In this configuration, each incoming request means allocations inside Go's RPC package [1], which in turn means that a GC pass will be triggered if GC_PERCENT [2] has been reached, which in turn means that the GC will have to scan all of those long lived pointers (Go's GC is not generational), which in turn means a huge peak in response time.

This basically leaves me with three possible solutions:

- hack into Go's RPC package to minimize allocations, which is a huge price to pay just to delay the inevitable

- build my caches in a language that offers manual memory management, then query those via RPC from my Go services; but I don't want to add a new language into the mix

- provide a generic solution for manual memory management in Go, which is where we are now

[1] https://golang.org/pkg/net/rpc/

[2] https://golang.org/pkg/runtime/debug/#SetGCPercent

There's also #4, using a ready-to-use external cache that can be accessed over RPC/network. Like memcache, or Redis. I'm sure you've thought of it, so can you explain why it didn't work in your scenario? I'm just curious.
Well, at the limits of performance, you're still looking at additional copies to get the cached data from the cache to the program, then probably another copy to get it from the input to the output, plus several context switches (which, even if they may ultimately have a relatively small effect on throughput, will increase latency). Having it in-process can be faster, if you're willing to pay the complexity price.

Again, it's almost certainly premature optimization to start with that design, but if that's where your optimization leads you, it's not that surprising.

FWIW, I wrote something myself that hits a similar problem, but in a completely different dimension: https://github.com/thejerf/gomempool My problem was that I had an otherwise rather placid program (from an allocation perspective) that liked to allocate buffers for messages that were many hundreds of kilobytes to low numbers of megabytes in size. In normal usage, only maybe one or two of these are ever in use at a time, but I use hundreds per second. In my case, each individual GC was actually not that big a deal, but I was triggering them every few seconds. The GC would see a lot of large allocs, and then successfully clean them up, meaning that the next batch of large allocations would be seen as crossing the threshold again. My stats clearly showed that on this system, once I started pooling my []byte I never even filled up the memory pool itself, and my GCs plummeted so far that I wouldn't even particularly care if they took half-a-second apiece anymore, which they don't. Almost everything other than those large message buffers were stack-alloc'ed anyhow.

You're certainly right that it's a possible and viable alternative. The reason I didn't go that way is quite simple: I always try and do my best to avoid external dependencies. One of the reason I really love Go is its ease of deployment thanks to its lack of dependencies: I love the idea of being just one scp away from running in staging/production; no more, no less.

I know many people won't agree with that, and there are definitely good reasons not to; and still, as far as I'm concerned, minimizing the complexity of my software stack means there's one less thing that I'll have to worry about, and at the end of the day, that is really quite the upside.

One might actually suffer from GC overhead much more than that, i.e. it might be the number one thing to be aware of and optimize, especially if your program has millions of pointers and/or a large heap.
Much more than "a realistic chance of optimizing everything else to the point that GC time becomes your biggest problem"?
Yes, these days, GC might easily be the biggest problem to the point that focusing on anything else would be a waste of time and premature optimization.

GC pauses can be anywhere between 300ms to 30 seconds or more when it starts becoming an issue.