|
|
|
|
|
by skitter
373 days ago
|
|
Fun post! An alternative to using futexes to store thread queues in kernel space is to store them yourself. E.g. the parking_lot[0] Rust crate, inspired by WebKit[1], uses only one byte to store the unlocked/locked/locked_contended state, and under contention uses the address of the byte to index into a global open-addressing hash table of thread queues. You look up the object's entry, lock said entry, add the thread to the queue, unlock it, and go to sleep. Because you know that there is at most one entry per thread, you can keep the load factor very low in order to keep the mutex fast and form the thread queue out of a linked list of thread-locals. Leaking the old hash on resizing helps make resizing safe. As a result, uncontended locks work the same as described in the blog post above; under contention, performance is similar to a futex too. But now your locks are only one byte in size, regardless of platform – while Windows allows 1-byte futexes, they're always 4 bytes on Linux and iirc Darwin doesn't quite have an equivalent api (but I might be wrong there). You also have more control over parked threads if you want to implement different fairness criteria, reliable timeouts or parking callbacks. One drawback of this is that you can only easily use this within one process, while at least on Linux futexes can be shared between processes. I've written a blog post[2] about using futexes to implement monitors (reëntrant mutexes with an associated condvar) in a compact way for my toy Java Virtual Machine, though I've since switched to a parking-lot-like approach. [0]: https://github.com/amanieu/parking_lot
[1]: https://webkit.org/blog/6161/locking-in-webkit
[2]: https://specificprotagonist.net/jvm-futex.html |
|
That's not a very useful property, though. Because inter-core memory works on cache-line granularities, packing more than one lock in a cache line is a Bad Idea™. Potentially it allows you to pack more data being protected by a lock with that data... but alignment rules means that you're going to invariably end up spending 4 or 8 bytes (via a regular integer or a pointer) on that lock anyways.