| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jerf 3858 days ago

Well, at the limits of performance, you're still looking at additional copies to get the cached data from the cache to the program, then probably another copy to get it from the input to the output, plus several context switches (which, even if they may ultimately have a relatively small effect on throughput, will increase latency). Having it in-process can be faster, if you're willing to pay the complexity price.

Again, it's almost certainly premature optimization to start with that design, but if that's where your optimization leads you, it's not that surprising.

FWIW, I wrote something myself that hits a similar problem, but in a completely different dimension: https://github.com/thejerf/gomempool My problem was that I had an otherwise rather placid program (from an allocation perspective) that liked to allocate buffers for messages that were many hundreds of kilobytes to low numbers of megabytes in size. In normal usage, only maybe one or two of these are ever in use at a time, but I use hundreds per second. In my case, each individual GC was actually not that big a deal, but I was triggering them every few seconds. The GC would see a lot of large allocs, and then successfully clean them up, meaning that the next batch of large allocations would be seen as crossing the threshold again. My stats clearly showed that on this system, once I started pooling my []byte I never even filled up the memory pool itself, and my GCs plummeted so far that I wouldn't even particularly care if they took half-a-second apiece anymore, which they don't. Almost everything other than those large message buffers were stack-alloc'ed anyhow.