Hacker News new | ask | show | jobs
by elteto 2314 days ago
How is runtime fault-tolerance achieved? My understanding of Erlang is that the BEAM VM implements these capabilities (custom threads, supervision, restarts, hot reload), but it is one level removed and above actual code. And they implement their own user-space threading runtime in order to support them. But in Rust, there is no such runtime (or is Bastion implementing one?) and it seems like this is used as a library. I'm very curious.

I think another way to frame my question would be: which is the basic unit of parallel execution in Bastion? A thread? Or a separate process? There are mentions of lightweight processes and subprocesses in the README but it is rather vague what these are.

3 comments

By runtime fault-tolerance they probably just mean an ability to do programmable supervisors that can react to actors dying, nothing special. And it's not like you can do a lot from a user space process anyway, apart from catching signals and destroying a currently running actor that caused it.
> How is runtime fault-tolerance achieved?

An actor is:

1. A lightproc, which, per my understanding, are async-spawned threads returning (optional?) Futures [0].

2. A ProcHandle [1] that lets you define process-state (like pid), control process-exec (like cancel, suspend?), listen on progress of a given lightproc that is run by BastianExecutors [2], whilst the message passing / supervisor semantics is handled by Bastion [3].

https://akka.io/ on JVM would be a better comparison to this than BEAM's implementation of actors, I think.

[0] https://github.com/bastion-rs/bastion/blob/2d9dc705962f30fbf...

[1] https://github.com/bastion-rs/bastion/blob/2d9dc705962f30fbf...

[2] https://github.com/bastion-rs/bastion/blob/2d9dc705962f30fbf...

[3] https://docs.rs/bastion/0.3.4/bastion/struct.Bastion.html

Erlang's use of m:n threading is orthogonal to fault-tolerance (perhaps not inside the implementation, but conceptually).
If an Erlang process crash cannot crash the entire system while Bastion's concept of a process can then threading is important part of fault-tolerance, isn't it?
It definitely is not orthogonal. Suppose an OS thread goes into an infinite loop. How do you cleanly stop it (feel free to assume Linux/Windows/MacOS)?. In Erlang this is possible because of the custom threading implementation.
In Erlang that's possible because the program runs in a VM. Erlang could do the same with 1:1 and m:1 threading.
It's because of the vm interpreter that calls into the scheduler within loop iterations. Nothing to do with threads.
> How do you cleanly stop it

The JVM manages this - there are many options:

- reading a flag - patching the running - changing memory protection and causing a segfault

> Suppose an OS thread goes into an infinite loop. How do you cleanly stop it (feel free to assume Linux/Windows/MacOS)?

ptrace it?