Hacker News new | ask | show | jobs
by astrange 638 days ago
You're lecturing me about my job here. I don't need to learn nothin'.

> reads become writes with ARC

That's not a big problem (it is a problem but a smaller one) since you can choose a different tradeoff wrt whether you keep the reference counting info on the same page or not. There's other allocator metadata with the same issue though.

A more interesting one comes up with GC too; if you're freeing all the time, everyone compresses their swap these days, which means zeroing the freed allocations is suddenly worth it because it compresses so much better.

> Originally, Swift was meant to use GC, but this failed because Apple could not integrate it well enough with existing Objective-C code, leading to a very crash-prone solution.

It was Objective-C that had the GC (a nice compacting one too) and it failed mostly for that reason, but has not come back because of the performance issues I mentioned.

> Also, JavaScript has nothing to do with the lower in abstraction languages discussed in this chain of comments.

Oh, people definitely want to use it in the same places and will if you don't stop them. See how everyone's writing apps in Electron now.

2 comments

> A more interesting one comes up with GC too; if you're freeing all the time, everyone compresses their swap these days, which means zeroing the freed allocations is suddenly worth it because it compresses so much better.

Moving GCs solve it much more elegantly, in my opinion, and Java is just so far ahead in this category than anyone else (like, literally the whole academic field is just Java GCs) that not mentioning it is a sin.

> literally the whole academic field is just Java GCs

Not necessarily a good thing. While reading Java-related papers I found myself constantly thinking "damn, they wrote a paper for something that is just 2.5 smaller pull-requests in dotnet/runtime". I wouldn't put the modern state of academia as the shining example...

What are you even talking about? C# has a famously simplistic GC which is basically one big, 1000 lines file. C# has very different tradeoffs compared to java, it pushes complexity to the user, making their runtime simple. Java does the reverse, having the language very simple, but the runtime is just eons ahead everything else. Like, call me when any other platform has a moving GC that stops the world for less than a millisecond independent of heap size like ZGC. Or just a regular GC that has a similar throughput as G1.
Historically, at its inception, .NET's GC was written in LISP and then transpiled to C++ with a custom converter. It is still a single-file implementation, but I'm afraid it's not 1000 but 53612 lines instead as we speak :)

Well, that's not one file per se and there is more code and "supporting" VM infrastructure to make GC work in .NET as well as it does (it's a precise tracing generational moving GC), so the statement that it pushes complexity onto the the user and underperforms could not be further from the truth. None of the JVM GC implementations maps to .NET 1:1, but there are many similarities with Shenandoah, Parallel, and some of the G1 aspects. In general, .NET is moving in the opposite direction to Java GCs - it already has great throughput, so the goal is to minimize the amount of memory it uses to achieve so, while balancing the time spent in GC (DATAS targets up to 3% CPU time currently). You also have to remember that the average .NET application has much lower allocation traffic.

In addition to that, without arguing on pros and cons of runtime simplicity (because I believe there is merit to Go's philosophy), .NET's CoreCLR implementation is anything but simple. So the statement does not correlate to reality at all - it makes different tradeoffs, sure, but together with CIL spec and C# design it makes historically better decisions than JVM and Java which lend themselves into more naturally achieving high performance - no interpreter stage, only intermediate compilations have to pay for OSR support, all method calls are non-virtual by default, true generics with struct monomorphization and so on and so forth. Another good example of the runtime doing truly heavy lifting on behalf of the user are byref pointers aka 'ref's - they can point to _any_ memory like stack, GC heap, unmanaged or even device mapped pages (all transparently wrapped into Span<T>!), and the runtime emits precise data for their tracking to update them if they happen to be pointers to object interiors without imposing any measurable performance loss - it takes quite a bit of compiler and GC infrastructure to make this work (exact register state for GC data for any safepoint for byrefs, brick tables for efficiently scanning referenced heap ranges, etc.).

List of references (not exhaustive):

High-level overview (it needs to be updated but is a good starting point): https://github.com/dotnet/runtime/blob/main/docs/design/core...

Implementation (the 53612 line file): https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/g...

.NET GC internals lectures by Konrad Kokosa (they are excellent even if you don't use .NET): https://www.youtube.com/watch?v=8i1Nv7wGsjk

Articles on memory regions:

https://devblogs.microsoft.com/dotnet/put-a-dpad-on-that-gc/

https://maoni0.medium.com/write-barrier-optimizations-in-reg...

https://itnext.io/how-segments-and-regions-differ-in-decommi...

Articles on DATAS:

https://github.com/dotnet/core/blob/main/release-notes/9.0/p... (quick example of the kind of heap size reduction applications could see)

https://maoni0.medium.com/dynamically-adapting-to-applicatio...

I did write ‘simple’, but obviously meant simpleR. A performant runtime will still require considerable complexity. Also, C# doesn’t underperform, I never said that — partially as the whole platform has access to lower level optimizations that avoid allocating in the first place, as you mention (but Span et alia does make the language considerably more complex than Java - which was my point).

But on the GC side it quite objectively has worse throughput than Java’s, one very basic data point would be the binary tree benchmark on benchmark games. This may or may not be a performance bottleneck in a given application, that’s besides the point. (As an additional data point, Swift is utterly bad on this benchmark finishing in 17sec, while java does in 2.59 and C# in 4.61), due to it having reference counting GC, which has way worse throughput than tracing GCs). But you are the one who already linked to this benchmark on this thread, so you do know it.

Do Go slices make it more complex? :)

Span<T> makes the language simpler from both the user and C# to IL bytecode point of view, all the complexity is in the runtime (well, not exactly anymore - there's ref T lifetime analysis now). On that note, Java does not seem to have a generic slice type, like ArraySegment<T> which predates spans. I can see it has ByteBuffer, CharBuffer, IntBuffer, AbstractEnterpriseIntProviderFactoryBuffer (/s), etc from NIO as well as sub-Lists(?) and using Streams in the style of LINQ's Skip+Take.

Spans are very easy to use, and advertising them as advanced type was a short-lived mistake at their inception. Since then, they have gotten adopted prominently throughout the ecosystem.

After all, it's quite literally just

  var text = "Hello, World!".AsSpan();
  var hello = text[..text.IndexOf(','));
  var twoChars = hello[..2];
And, to emphasize, they transparently work with stack buffers, arrays, unmanaged memory and anything in-between. You can even reference a single field from an object:

    var user = (Name: "John", DoB: new DateTime(1989, 1, 1));
    ProcessStrings(new(ref user.Name));

    // Roslyn emits an inline array struct, from which a span is constructed
    // It's like T... varargs in Java but guaranteed zero-cost
    // In C# 13, this gets support of params so many existing callsites
    // that used to be params T[] start accepting spans instead,
    // completely eliding allocations or even allowing the compiler
    // to reference baked into binary constant arrays
    ProcessStrings(["hello", "world"]);

    void ProcessStrings(Span<string> values) { /* ... */ }
On binary-trees - indeed, the results are interesting and Java demonstrates consistently lower CPU time cost to achieve similar or higher throughput (look at benchmark results distribution). It is a stress-test for allocation and collection throughput, yes. However, Java benchmarks also tend to consume consistently more memory even in allocatey scenarios: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

In any case, I have asked around for more data on detailed comparison of heap profiles between G1, Zgc and Parallel and will post them here if I get a response to provide more context. It's an interesting topic.

If your point of reference are Objective-C and Swift only, and you have not looked at how .NET's or Go's (which makes very different tradeoffs w.r.t. small memory footprint) GCs work, it might be interesting to re-examine prior assumptions in light of modern designs (I can't say Go is modern per se, but it is interesting nonetheless).

Also, .NET tends to heavily zero memory in general, as the spec dictates that fields, variables, arrays contents, etc. must be initialized to their default values before use (which is zero). Compiler can and will elide unneeded zeroing where it can see, but the point is that .NET's heaps should compress quite well (and this seems to be the case on M-series devices).

There are popular apps written in C# on the platform, but they're Unity games, which use il2cpp and I believe still use Boehm gc. I think this demonstrates a different point, since even a bad GC apparently doesn't stop them from shipping a mobile game… but it is a bad GC.

(Games typically don't care about power efficiency much, as long as the phone can keep up rendering speed anyway.)

> Also, .NET tends to heavily zero memory in general, as the spec dictates that fields, variables, arrays contents, etc. must be initialized to their default values before use (which is zero).

Same for most other languages, but there's a time difference between zeroing on free and zeroing on allocation. Of course, once you've freed everything on the page there are ways to zero the page without swapping it back in. (just tell the OS to zero it next time it reads it)

Yeah, Unity has terrible GC, even with incremental per-frame collection improvement. It's going to be interesting to look at the difference once they finish migration to CoreCLR.

If you'd like to look at a complex project, you can try Ryujinx: https://www.ryujinx.org. It even has native integration[0] with Apple Hypervisor to run certain games as-is on ARM64 Macs. There's also Metal back-end in the works.

Other than that, any new .NET application runs on MacOS provided they don't use platform-specific libraries (either something that uses Linux dependencies or kernel APIs or Windows ones). My daily drive device is an MBP.

A side-note is that on MacOS .NET does not use regions-based heaps yet and uses older segment-based ones. This has implications in terms of worse memory usage efficiency but nothing world-ending.

[0]: https://github.com/Ryujinx/Ryujinx/tree/73f985d27ca0c85f053e...