| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by neonsunset 636 days ago

Historically, at its inception, .NET's GC was written in LISP and then transpiled to C++ with a custom converter. It is still a single-file implementation, but I'm afraid it's not 1000 but 53612 lines instead as we speak :)

Well, that's not one file per se and there is more code and "supporting" VM infrastructure to make GC work in .NET as well as it does (it's a precise tracing generational moving GC), so the statement that it pushes complexity onto the the user and underperforms could not be further from the truth. None of the JVM GC implementations maps to .NET 1:1, but there are many similarities with Shenandoah, Parallel, and some of the G1 aspects. In general, .NET is moving in the opposite direction to Java GCs - it already has great throughput, so the goal is to minimize the amount of memory it uses to achieve so, while balancing the time spent in GC (DATAS targets up to 3% CPU time currently). You also have to remember that the average .NET application has much lower allocation traffic.

In addition to that, without arguing on pros and cons of runtime simplicity (because I believe there is merit to Go's philosophy), .NET's CoreCLR implementation is anything but simple. So the statement does not correlate to reality at all - it makes different tradeoffs, sure, but together with CIL spec and C# design it makes historically better decisions than JVM and Java which lend themselves into more naturally achieving high performance - no interpreter stage, only intermediate compilations have to pay for OSR support, all method calls are non-virtual by default, true generics with struct monomorphization and so on and so forth. Another good example of the runtime doing truly heavy lifting on behalf of the user are byref pointers aka 'ref's - they can point to _any_ memory like stack, GC heap, unmanaged or even device mapped pages (all transparently wrapped into Span<T>!), and the runtime emits precise data for their tracking to update them if they happen to be pointers to object interiors without imposing any measurable performance loss - it takes quite a bit of compiler and GC infrastructure to make this work (exact register state for GC data for any safepoint for byrefs, brick tables for efficiently scanning referenced heap ranges, etc.).

List of references (not exhaustive):

High-level overview (it needs to be updated but is a good starting point): https://github.com/dotnet/runtime/blob/main/docs/design/core...

Implementation (the 53612 line file): https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/g...

.NET GC internals lectures by Konrad Kokosa (they are excellent even if you don't use .NET): https://www.youtube.com/watch?v=8i1Nv7wGsjk

Articles on memory regions:

https://devblogs.microsoft.com/dotnet/put-a-dpad-on-that-gc/

https://maoni0.medium.com/write-barrier-optimizations-in-reg...

https://itnext.io/how-segments-and-regions-differ-in-decommi...

Articles on DATAS:

https://github.com/dotnet/core/blob/main/release-notes/9.0/p... (quick example of the kind of heap size reduction applications could see)

https://maoni0.medium.com/dynamically-adapting-to-applicatio...

1 comments

kaba0 636 days ago

I did write ‘simple’, but obviously meant simpleR. A performant runtime will still require considerable complexity. Also, C# doesn’t underperform, I never said that — partially as the whole platform has access to lower level optimizations that avoid allocating in the first place, as you mention (but Span et alia does make the language considerably more complex than Java - which was my point).

But on the GC side it quite objectively has worse throughput than Java’s, one very basic data point would be the binary tree benchmark on benchmark games. This may or may not be a performance bottleneck in a given application, that’s besides the point. (As an additional data point, Swift is utterly bad on this benchmark finishing in 17sec, while java does in 2.59 and C# in 4.61), due to it having reference counting GC, which has way worse throughput than tracing GCs). But you are the one who already linked to this benchmark on this thread, so you do know it.

neonsunset 636 days ago

Do Go slices make it more complex? :)

Span<T> makes the language simpler from both the user and C# to IL bytecode point of view, all the complexity is in the runtime (well, not exactly anymore - there's ref T lifetime analysis now). On that note, Java does not seem to have a generic slice type, like ArraySegment<T> which predates spans. I can see it has ByteBuffer, CharBuffer, IntBuffer, AbstractEnterpriseIntProviderFactoryBuffer (/s), etc from NIO as well as sub-Lists(?) and using Streams in the style of LINQ's Skip+Take.

Spans are very easy to use, and advertising them as advanced type was a short-lived mistake at their inception. Since then, they have gotten adopted prominently throughout the ecosystem.

After all, it's quite literally just

  var text = "Hello, World!".AsSpan();
  var hello = text[..text.IndexOf(','));
  var twoChars = hello[..2];

And, to emphasize, they transparently work with stack buffers, arrays, unmanaged memory and anything in-between. You can even reference a single field from an object:

    var user = (Name: "John", DoB: new DateTime(1989, 1, 1));
    ProcessStrings(new(ref user.Name));

    // Roslyn emits an inline array struct, from which a span is constructed
    // It's like T... varargs in Java but guaranteed zero-cost
    // In C# 13, this gets support of params so many existing callsites
    // that used to be params T[] start accepting spans instead,
    // completely eliding allocations or even allowing the compiler
    // to reference baked into binary constant arrays
    ProcessStrings(["hello", "world"]);

    void ProcessStrings(Span<string> values) { /* ... */ }

On binary-trees - indeed, the results are interesting and Java demonstrates consistently lower CPU time cost to achieve similar or higher throughput (look at benchmark results distribution). It is a stress-test for allocation and collection throughput, yes. However, Java benchmarks also tend to consume consistently more memory even in allocatey scenarios: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

In any case, I have asked around for more data on detailed comparison of heap profiles between G1, Zgc and Parallel and will post them here if I get a response to provide more context. It's an interesting topic.