Hacker News new | ask | show | jobs
by teh_cmc 3859 days ago
I couldn't have said it better. This is almost word for word why I decided to build mmm: I maintain a few high-load RPC services in Go, each of which stores millions, if not tens of millions of items in their cache.

In this configuration, each incoming request means allocations inside Go's RPC package [1], which in turn means that a GC pass will be triggered if GC_PERCENT [2] has been reached, which in turn means that the GC will have to scan all of those long lived pointers (Go's GC is not generational), which in turn means a huge peak in response time.

This basically leaves me with three possible solutions:

- hack into Go's RPC package to minimize allocations, which is a huge price to pay just to delay the inevitable

- build my caches in a language that offers manual memory management, then query those via RPC from my Go services; but I don't want to add a new language into the mix

- provide a generic solution for manual memory management in Go, which is where we are now

[1] https://golang.org/pkg/net/rpc/

[2] https://golang.org/pkg/runtime/debug/#SetGCPercent

1 comments

There's also #4, using a ready-to-use external cache that can be accessed over RPC/network. Like memcache, or Redis. I'm sure you've thought of it, so can you explain why it didn't work in your scenario? I'm just curious.
Well, at the limits of performance, you're still looking at additional copies to get the cached data from the cache to the program, then probably another copy to get it from the input to the output, plus several context switches (which, even if they may ultimately have a relatively small effect on throughput, will increase latency). Having it in-process can be faster, if you're willing to pay the complexity price.

Again, it's almost certainly premature optimization to start with that design, but if that's where your optimization leads you, it's not that surprising.

FWIW, I wrote something myself that hits a similar problem, but in a completely different dimension: https://github.com/thejerf/gomempool My problem was that I had an otherwise rather placid program (from an allocation perspective) that liked to allocate buffers for messages that were many hundreds of kilobytes to low numbers of megabytes in size. In normal usage, only maybe one or two of these are ever in use at a time, but I use hundreds per second. In my case, each individual GC was actually not that big a deal, but I was triggering them every few seconds. The GC would see a lot of large allocs, and then successfully clean them up, meaning that the next batch of large allocations would be seen as crossing the threshold again. My stats clearly showed that on this system, once I started pooling my []byte I never even filled up the memory pool itself, and my GCs plummeted so far that I wouldn't even particularly care if they took half-a-second apiece anymore, which they don't. Almost everything other than those large message buffers were stack-alloc'ed anyhow.

You're certainly right that it's a possible and viable alternative. The reason I didn't go that way is quite simple: I always try and do my best to avoid external dependencies. One of the reason I really love Go is its ease of deployment thanks to its lack of dependencies: I love the idea of being just one scp away from running in staging/production; no more, no less.

I know many people won't agree with that, and there are definitely good reasons not to; and still, as far as I'm concerned, minimizing the complexity of my software stack means there's one less thing that I'll have to worry about, and at the end of the day, that is really quite the upside.