Hacker News new | ask | show | jobs
by rdtsc 752 days ago
Do they have isolated heaps and can they be preempted, even if they spin in an infinite loop doing some CPU intensive things?
1 comments

Tasks are not processes, and that would be a wrong thing to do, and so would be "isolated heaps" given performance requirements faced by .NET - you do want to share memory through concurrent data structures (which e.g. channels are despite what go apologists say), and easily await them when you want to.

CSP, while is nice on paper, has the same issues as e.g. partitioning in Kafka, just at a much lower level where it becomes critical bottleneck - you can't trivially "fork" and "join" the flows of execution, which well-implemented async model enables.

It's not "what about x" but rather how you end up applying the concurrent model in practice, and C# tasks allow you to idiomatically mix in concurrency and/or parallelism in otherwise regular code (as you can see in the example).

I'm just clarifying on the parent comment that concurrency in .NET is not like in Java/C++/Python (even if the latter does share similarities, there are constraints of Python itself).

> and that would be a wrong thing to do, and so would be "isolated heaps" - you do want to share memory through concurrent data structures (which e.g. channels are despite what go apologists say), and easily await them when you want to.

It depends on the context. In some contexts absolutely not. If we share memory, and these tasks start modifying global data or taking locks and then crash, can those tasks be safely restarted, can we reason about the state of the whole node any longer?

> CSP, while is nice on paper

Not sure if Erlang's module is CSP or Actor's (it started as neither actually) but it's not just nice on paper. We have nodes with millions of concurrent processes running comfortably, I know they can crash or I can restart various subsets of them safely. That's no small thing and it's not just paper-theoretical.

RE: locks and concurrently modified data-structures

It comes down to the kind of lock being used. Scenarios which require strict data sharing handle them as they see fit - for recoverable states the lock can simply be released in a `finally` block. Synchronous/blocking `lock` statement does this automatically. All concurrent containers offered by standard library either do not throw or their exceptions indicate a wrong operation/failed precondition/etc. and can be recovered from (most exceptions in C# are, in general).

This does not preclude the use of channel/mailbox and other actor patterns (after all, .NET has Channel<T> and ConcurrentQueue<T> or if you would like to go from 0 to 100 - Akka and Orleans, and the language offers all the tools to write your own fast implementation should you want that).

Overall, I can see value of switching to Erlang if you are using a platform/language with much worse concurrency primitives, but with F# and C#, personally, Erlang and Elixir appear to be a sidegrade as .NET applications tend to scale really well with cores even when implemented sloppily.

If you use an 96 core machine, or 96 individual machines with single core each, the Erlang code is going to look pretty much the same.
What value does isolated heap offer for memory-safe languages?

Task exceptions can simply be handled via try-catch at the desired level. Millions of concurrently handled tasks is not that high of a number for .NET's threadpool. It's one thing among many that is "nothingburger" in .NET ecosystem which somehow ends up being sold as major advantage in other languages (you can see it with other features too - Nest.js as a "major improvement" for back-end, while it just looks like something we had 10 years ago, "structured concurrency" which is simple task interleaving, etc.).

It's a different, lower-level model, but it comes with the fact that you are not locked into particular (even if good) way of doing concurrency in Erlang.

Briefly, the tradeoff that Erlang and its independent process heaps model make is that garbage collection (and execution in general) occurs per-process. In practical terms, this means you have lots of little garbage collections and much fewer "large" (think "full OS process heap") collections.

This provides value in a few ways:

- conceptually: it is very simple. i.e., the garbage collection of one process is not logically tied to the garbage collection of another.

- practically: it lends itself well to low-latency operations, where the garbage collection of one process is able to happen concurrently to the the normal operation of another process.

Please note that I am not claiming this model is superior to any other. That is of course situational. I am just trying to be informative.

This is a good post with more information, if you're interested: https://hamidreza-s.github.io/erlang%20garbage%20collection%...

Thanks!
GC determinism is one of the things you get. Another one is non cooperative asynchronous termination.
Pretty much all efficient GC implementations are inherently non-deterministic, even if predictable.

How can this improve predictability of GC impact?

No global GC. Each erlang process does its own GC, and the GC only happens when the process runs out of space (ie. the heap and stack meet).

You can for example configure a process to have enough initial memory so as not to ever run into GC, this is especially useful if you have a process that does a specific task before terminating. Once terminated the entire process memory is reclaimed.