| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pron 341 days ago

> It turns out that using more RAM costs you more CPU

Yes, memory bandwidth adds another layer of complication, but it doesn't matter so much once your live set is much larger than your L3 cache. I.e. a 200MB live set and a 100GB live set are likely to require the same bandwidth. Add to that the fact that tracing GCs' compaction can also help (with prefetching) and the situation isn't so clear.

> That's the kind of problem that GC was really meant for.

Given the huge strides in tracing GCs over the past ten and even five years, and their incredible performance today, I don't think it matters what those of 40+ years ago were meant for, but I agree there are still some workloads - not anything that isn't spaghetti-like, but specifically arenas - that are more efficient than tracing GCs (young-gen works a little like an arena but not quite), which is why GCs are now turning their attention to that kind of workload, too. The point remains that it's very useful to have a memory management approach that can turn the RAM you've already paid for to reduce CPU consumption.

Indeed, we're not seeing any kind of abandonment of tracing GC at a rate that is even close to suggesting some significant economic value in abandoning them (outside of very RAM-constrained hardware, at least).

2 comments

zozbot234 341 days ago

> The point remains that it's very useful to have a memory management approach that can turn the RAM you've already paid for to reduce CPU consumption.

That approach is specifically arenas: if you can put useful bounds on the maximum size of your "dead" data, it can pay to allocate everything in an arena and free it all in one go. This saves you the memory traffic of both manual management and tracing GC. But coming up with such bounds involves manual choices, of course.

It goes without saying that memory compaction involves a whole lot of extra traffic on the memory subsystem, so it's unlikely to help when memory bandwidth is the key bottleneck. Your claim that a 200MB working set is probably the same as a 100GB working set (or, for that matter, a 500MB or 1GB working set, which is more in the ballpark of real-world comparisons) when it comes to how it's impacted by the memory bottleneck is one that I have some trouble understanding also - especially since you've been arguing for using up more memory for the exact same workload.

Your broader claim wrt. memory makes a whole lot of sense in the context of how to tune an existing tracing GC when that's a forced choice anyway (which, AIUI, is also what the talk is about!) but it just doesn't seem all that relevant to the merits of tracing GC vs. manual memory management.

> we're not seeing any kind of abandonment of tracing GC at a rate that is even close to suggesting some significant economic value in abandoning them

We're certainly seeing a lot of "economic value" being put on modern concurrent GC's that can at least perform tolerably well even without a lot of memory headroom. That's how the Golang GC works, after all.

link

pron 341 days ago

> It goes without saying that memory compaction involves a whole lot of extra traffic on the memory subsystem

It doesn't go without saying that compaction involves a lot of memory traffic, because memory is utilised to reduce the frequency of GC cycles and only live objects are copied. The whole point of tracing collection is that extra RAM is used to reduce the total amount of memory management work. If we ignore the old generation (which the talk covers separately), the idea is that you allocate more and more in the young gen, and when it's exhausted you compact only the remaining live objects (which is a constant for the app); the more memory you assign to the young gen, the less frequently you need to do even that work. There is no work for dead objects.

> when it comes to how it's impacted by the memory bottleneck is one that I have some trouble understanding also - especially since you've been arguing for using up more memory for the exact same workload.

Memory bandwidth - at least as far as latency is concerned - is used when you have a cache miss. Once your live set is much bigger than your L3 cache, you get cache misses even when you want to read it. If you have good temporal locality (few cache misses), it doesn't matter how big your live set is, but the same is if you have bad temporal locality (many cache misses).

> which, AIUI, is also what the talk is about

The talk focuses on tracing GCs, but it applies equally to manual memory management (as discussed in the Q&A; using less memory for the same algorithm requires CPU work regardless if it's manual or automatic)

> when that's a forced choice

I don't think tracing GCs are ever a forced choice. They keep getting chosen over and over for heavy workloads on machines with >= 1GB/core because they offer a more attractive tradeoff than other approaches for some of the most popular application domains. There's little reason for that to change unless the economics of DRAM/CPU change significantly.

link

Ygg2 341 days ago

> It doesn't go without saying that compaction involves a lot of memory traffic

It definitely tracks with my experience. Did you see Chrome on AMD EPYC with 2TB of memory? It reached like 10% of Mem utility but over 46% of CPU around 6000 tabs. Mem usage climbed steeply at first but got overtaken by CPU usage.

link

pron 341 days ago

I have no idea what it's using its CPU on, whether it has anything to do with memory management, or what memory management algorithm is in use. Obviously, the GC doesn't need to do any compaction if the program isn't allocating, and the program can only allocate if it's actually doing some computation. Also, I don't know the ratio of live set to total heap. A tracing GC needs to do very little work if most of the heap is garbage (i.e. the ration of live set to the total memory is low), but any form of memory management - tracing or manual - needs to do a lot of work if the ratio is low. Remember, a tracing-moving GC doesn't spend any cycles on garbage; it spends cycles on live objects only. The more heap you give it (assuming the same allocation rate and live set), means more garbage and less CPU consumption (as GC cycles are less frequent).

All you know is that CPU is exhausted before the RAM is, which, if anything, means that it may have been useful for Chrome to use more RAM (and reduce the liveset-to-heap ratio) to reduce CPU utilisation, assuming this CPU consumption has anything to do with memory management.

There is no situation in which, given the same allocation rate and live set, adding more heap to a tracing GC makes it work more. That's why in the talk he says that a DIMM is a hardware accelerator for memory management if you have a tracing-moving collector: increase the heap and voila, less CPU is spent on memory management.

That's why tracing-moving garbage collection is a great choice for any program that spends a significant amount of CPU on memory management, because then you can reduce that work by adding more RAM, which is cheaper than adding more CPU (assuming you're running on a machine that isn't RAM-constrained, like small embedded devices).

link

mwcampbell 340 days ago

> (outside of very RAM-constrained hardware, at least)

I've spent much of my career working on desktop software, especially on Windows, and especially programs that run continuously in the background. I've become convinced that it's my responsibility to treat my user's machines as RAM-constrained, and, outside of any long-running compute-heavy loops, to value RAM over CPU as long as the program has no noticeable lag. My latest desktop product was Electron-based, and I think it's pretty light as Electron apps go, but I wish I'd had the luxury of writing it all in Rust so it could be as light as possible (at least one background service is in Rust). My next planned desktop project will be in Rust.

A recent anecdote has reinforced my conviction on this. One of my employees has a PC with 16 GB of RAM, and he couldn't run a VMware VM with 4 GB of guest RAM on that machine. My old laptop, also with 16 GB of RAM, had plenty of room for two such VMs. I didn't dig into this with him, but I'm guessing that his machine is infested with crap software, much of which is probably using Electron these days, each program assuming it can use as much RAM as it wants. I want to fight that trend.

link

pron 340 days ago

It's perfectly valid to choose RAM over CPU. What isn't valid is believing that this tradeoff doesn't exist. However, cloud deployments are usually more CPU-constrained than RAM constrained, so it's important to know that more RAM can be used to save CPU when significant processing is spent on memory management.

link