Hacker News new | ask | show | jobs
by compudj 2916 days ago
Sure, before getting it upstream, I had to:

- Gather a list of desiderata, ensuring we take into account a complete list of use-cases targeted by everyone active in the rseq discussions. This is crucially important to ensure discussions don't spin in circles going back and forth between different requirements,

- Redesign the uapi/linux/rseq.h ABI, making sure a single TLS store is needed to enter a rseq critical section, without requiring any extra registers as ABI. I have introduced the "rseq_cs" structure as critical section descriptor to do this,

- Optimize arm32 and x86 rseq critical sections for speed, by creating my own benchmark programs,

- Rewrite the kernel rseq implementation a few times so it follows the kernel coding style and ensure it pleases everyone caring about it,

- Present 2 talks about rseq at Linux Plumbers Conference,

- Go through various rounds of in person, email, and IRC discussions with Paul Turner, Peter Zijlstra, Andy Lutomirski, Boqun Feng, Paul E. McKenney, Thomas Gleixner, Ben Maurer, Linus Torvalds, and many others. Those were very constructive discussions bringing up everyone's concerns with respect to this new system call,

- Extend the rseq selftests, adding new testing strategies such as delay loops between "steps" of the critical section, thus increasing the likelihood of generating preemption races,

- Figure out nasty races only happening on NUMA systems after about a full day of stress-testing,

- Provide solutions for debugger single-stepping "lack of progress" problem if rseq is used when retrying on abort. It's basically the cpu_opv system call I plan to propose for 4.19. Meanwhile, without cpu_opv, rseq can still be used in ways to guarantee forward progress, but the abort code needs to use a partitioning strategy rather than a simple retry (e.g. going to a different memory pool in case of abort for a memory allocator),

- Harden the rseq mechanism for security, by adding a "signature" word before the abort label,

- Implement prototypes of lttng-ust and liburcu which use rseq, gathering benchmarks to validate the approach,

- Write rseq and cpu_opv man pages.

And this is just the items that were "forward progress" in the rseq adventure. I'm leaving out everything that were attempts at making things more generic that had to be thrown away.

1 comments

Thanks for this detailed and a bit overwhelmingly long list. When I saw the first Paul Turner techtalk on rseq (it was called something else - LPC - if I remember correctly), it seemed so simple, so obvious "just read this memory address and if there was an interruption, we have to retry".

But then of course real life is a lot more complex than slideware.

cpu_opv is new for me (no time for LWN these days), but looks simple, elegant and sort of obvious (again). Which makes me wonder why no one thought about it yet. (But of course this is probably my ignorance speaking.)

Thanks for pushing the limits!