|
Personally, I'm disappointed that there is too much superstitions and assumptions about GC designs going around, which lead to the discussion like this one that treats specialized designs with explicit tradeoffs, which both Go and BEAM are, as universally superior options. Or discussions that don't recognize that GC is in many scenarios an optimization over malloc/free and a significant complexity reduction over explicit management of arenas. When it comes to Java - it has multiple GC implementations with different tradeoffs and degree of sophistication. I'm not very well versed in their details besides the fact that pretty much all of them are quite liberal with the use of host memory. So the way I approach it is by assuming that at least some of them resemble the GC implementation in .NET, given extensive evidence that under allocation-heavy scenarios they have similar (throughput) performance characteristics. As for .NET itself, in server scenarios, it uses SRV GC which has per-core heaps (the count is sizing is now dynamically scalable per workload profile, leading to much smaller RAM footprint) and multi-threaded collection, which lends itself to very high throughput and linear scaling with cores even on very large hosts thanks to minimal contention (think 128C 1TiB RAM, stometimes you need to massage it with flags for this, but it's nowhere near the amount of ceremony required by Java). Both SRV and WKS GC implementations use background collection for Gen2, large and pinned object heaps. Collection of Gen0 and Gen1 is pausing by design as it lends itself for much better throughput and pause times are short enough anyway. They are short enough that modern .NET versions end up having better p99 latency than Go on multi-core throughput saturated nodes. Given decent enough codebase, you only ever need to worry about GC pause impact once you go into the territory of systems with hard realtime requirements. One of the better practical examples of this that exists in open source is Osu! which must run its game loop 1000hz - only 1ms of budget! This does pose challenges and requires much more hands-on interaction with GC like dynamically switching GC behaviopr depending on scenario: https://github.com/dotnet/runtime/issues/96213#issuecomment-... This, however, would be true with any language with automatic memory management, if it's possible to implement such a system in it in the first place. |
Small things like tuples (combining multiple values in outputs) and out parameters relaxes the burden on the runtime since programmers don't need to create objects just to send several things back out from a function.
But the real kicker probably comes since the lowlevel components, be it the Http server with ValueTasks and the C# dictionary type getting memory savings just by having struct types and proper generics. I remember reading some article from years ago about re-writing the C# dictionary class that they reduced memory allocations by something like 90%.