Hacker News new | ask | show | jobs
by GlenTheMachine 16 days ago
If you had no idea what a restorable sequence is the takeaway is about halfway down the OP:

“This is why Linux now provides rseq() which is a much more enlightened solution. With restartable sequences, you actually can get rid of both the mutex and atomics, while the OS continues to fully abstract scheduling. The way it works is you advise the kernel whenever your program enters a critical section of code that you don't want interrupted. It's probably going to be maybe 10 assembly instructions tops. The first assembly opcode should be a move instruction that sets the rseq_cs field. The last instruction needs to be the thing that makes the modification to your global data structure. Think of it sort of like a really tiny database transaction. What makes it go fast, is that the bidirectional communication with the kernel happens via shared memory.”

6 comments

That doesn't really explain it though, IMO. IIUC, it's a sequence of instructions that either runs to completion atomically or doesn't. If it is interrupted by anything the kernel jumps you to the abort/retry vector you set with a guarantee that the last instruction in the sequence was not executed.

(Based on my reading of the LWN article rwmj posted).

Yes, the API contract isn't "don't interrupt me during this critical section" it's "if you have to interrupt me during this critical section, go to this recovery/restart code".

There is a time-slice extension feature in the works that's roughly "please let me finish this critical section before you interrupt me". But a hard guarantee that userspace code won't be interrupted is probably untenable in a preemptive multitasking system.

> But a hard guarantee that userspace code won't be interrupted is probably untenable in a preemptive multitasking system.

I wonder why this is the case. Considering modern (personal) computers have more cores than available work (or won't starve other processes even if they hog a part of the available cores), I would not think it so horrible for an OS to offer a guarantee to some (maybe specially privileged) processes, that as long as they don't wait on a resource, they won't be interrupted.

We also have a perfectly workable model for describing such a state to the OS - priority inversion. Imagine a 'god mutex', which when acquired would boost your thread's priority to the maximum - as if the 'god process' would be waiting for said mutex and as long as your thread held it, it couldn't continue working, until your thread finished using it.

I think it would be a neat feature for certain real-time(ish) scenarios, like audio/video processing.

I think a lot of modern OS facilities are catering towards a world with much scarcer resources than our current one, for example, it would be perfectly fine for a process to assume at least a core part of its data is always in RAM and can't be swapped out.

When you get into low-level systems, like kernels, a lot of the need for special privileges comes from having to have these guarantees, which could be granted nowadays on a much more lenient basies.

Because poorly behaved processes can use this to lock up the machine.
Something not completely dissimilar to https://en.wikipedia.org/wiki/PCLSRing then?
> it's a sequence of instructions that either runs to completion atomically or doesn't

The way I read it, it either runs to completion in one go, or gets restarted from the beginning. This means the sequence as a whole isn't executed atomically, as the already-executed instructions during an interrupt aren't rolled back.

It can be used to build atomic actions, but it is up to the developer to create a sequence of instructions where the very last instruction "commits" the entire operation, with the side-effects of partial execution being harmless.

Yes, it's either atomic or the last instruction is guaranteed not to have run. I made this a little harder to read by inserting another clause in the sentence.
LWN has a good article: https://lwn.net/Articles/1033955/
I think it wasn't explained in a very accessible way. If I got the gist right, this essentially brings "per-CPU" synchronization to userland. It's typical in the kernel to have per-cpu data, while per-thread data is rare and typically impractical. There is a high number of threads managed by the kernel, most of which probably belong to a userland process, most of which do not participate in any given synchronisation scheme. Also threads are often too much of an abstraction for parallel programming needs, given that they are hiding for example cache effects. So it's natural to want to use per-cpu data instead of thread_local data in a userland process, I know I've been wishing for that many times.

With rseq, we can allocate in any userland process one instance of a given synchronisation data structure per each CPU. It's important to understand that userland code accessing per-cpu data structures cannot prevent being scheduled away from a CPU and being replaced by another thread (kernel code can block scheduler for short critical sections). Such a replacement thread may subsequently corrupt that same data that was still in the middle of the transaction. But we can make a subset of transactions safe at least: If a transaction gets committed in a single (final) atomic instruction, and we get kernel support for this transaction to be restarted in case there has been a schedule mid-way, this is a guarantee that at the time of commit, the entire transaction hasn't been interrupted by the scheduler. I.e. a kind of "mutual exclusion" guarantee.

Did I get that right?

You don't have to use it for this. For example, you can use it for your own transactional memory or hazard pointer scheme.
That’s clever — am I right to think it’s the intermediate solution between locks and full STM, implemented at the kernel level, and with zero abstraction cost?
It's in some sense a light form of STM. The key insight behind rseq(2) is that if the data is local to a given CPU the only way to get a race is if the kernel deschedules your program from that CPU at an inopportune time. If your operation can be aborted and restarted and the kernel has a mechanism to notify you when that needs to happen you can dispense with the overhead of "real" synchronization and just use a couple mov instructions to enter and exit the critical section.
I am not very well versed here but I think due to the requirement for assembly and single-instruction commit, practical uses of rseq is generally very simple. It is nowhere near the usefulness of locks.
Certainly not zero cost. Syscalls are heavy, that's why everyone prefers green threads.

And 2 more ops per rseq.

But rseq's are certainly cool.

I don't get it, how does this work with multiple processes running at 100% CPU time? Atomics are necessary for the CPU level.
I wonder what the speedup would be if runtimes like golang or iava were to adopt rseq on linux.