Hacker News new | ask | show | jobs
by maffydub 3137 days ago
Borrowing from https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.h... (linked to from the original post), the following code tries to access a lock-protected vector.

    fn use_lock(mutex: &Mutex<Vec<i32>>) {
        let vec = {
            // acquire the lock
            let mut guard = lock(mutex);
    
            // attempt to return a borrow of the data
            access(&mut guard)
    
            // guard is destroyed here, releasing the lock
        };
    
        // attempt to access the data outside of the lock.
        vec.push(3);
    }
It doesn't compile because the lock is not held long enough.

    error: `guard` does not live long enough
    access(&mut guard)
                ^~~~~
There are several more examples in that article, but you can read them there rather than here!
3 comments

Thanks (upvoted)! However how in the world would the compiler know if the mutex belongs to the same vector you are accessing? (Unless you're saying the mutex wraps the vector implying that each vector can have one corresponding mutex at most? Which seems quite limiting?)
Thanks!

Yes, when you construct a mutex, you give it ownership of the vector you want it to protect. Due to the way Rust works, once you've given ownership to something else, you can no longer access it yourself.

The only way you can get access to the data again is to "borrow" a reference to it, but this borrow has a "lifetime", which is tied to the period for which you're holding the mutex. This is how the compiler can spot that you've tried to access the vector after the mutex has been released.

As you say, this means that each vector can only have one mutex wrapping it - at least for this implementation of mutex, known as std::sync::Mutex ( https://doc.rust-lang.org/std/sync/struct.Mutex.html).

However, if you're looking for something like a read/write lock, Rust supports that too via std::sync::RwLock - see https://doc.rust-lang.org/std/sync/struct.RwLock.html for more details.

Obviously, the example uses a vector, but Rust has a pretty strong system of generics, so your mutex (or read/write lock) can wrap pretty much any type, e.g. a struct you've defined.

Hope that makes sense - this was one of the areas of Rust that impressed me most!

Yeah that makes sense! It strikes me as pretty similar to volatile-correctness in C++ (it's a technique -- look it up if you haven't heard of it) with the additional constraint that there is only ever one mutable reference, for better or for worse.
For cases where you need more than one mutable reference, you can use `Cell` (single-threaded) or atomics, which allow mutation through shared references.
> Unless you're saying the mutex wraps the vector implying that each vector can have one corresponding mutex at most?

The mutex owns the vector. Locking it returns a MutexGuard, which is a smart pointer/reference to the data it owns[0]. You can nest mutexes and structures if you need something slightly more flexible than a single big lock.

Having multiple (side-by-side( locks for a single structure sounds like a recipe for concurrency bugs & deadlocks.

> Which seems quite limiting?

If your semantics call for a mutex it makes perfect sense. There are other types of locks if you don't want a 1:1 relationship e.g. RWLock allows multiple readers XOR a single writer.

[0] mutable, so you can replace the structure behind the guard, you just can't keep a reference to it after you release the lock due to Rust's borrow checking semantics

> Having multiple (side-by-side( locks for a single structure sounds like a recipe for concurrency bugs & deadlocks.

It is, however, often necessary in order to reduce an undesirable level of contention. For example, hash tables often use fine-grained locking (with one lock per bucket) in order to increase actual parallelism and to prevent the hash table from becoming a serialization bottleneck.

Another common example is the implementation of FIFO queues with two locks (one for the head and one for the tail), which allows threads that read from the queue to avoid/reduce contention with threads that write to the queue.

This is both normal and frequently necessary for performance.

I think the way that you'd implement the per-bucket locks in a hash table is by having a globally shared (but read-only) vector of buckets, each of which would be a mutex wrapping a vector of key-value pairs. In that way, you don't have multiple locks on the same data structure - you instead have one lock per bucket, which falls back to the simple case described above.

I think an approach similar to this (except with multiple sub-hash-tables, each of which is individually locked) is used by https://github.com/veddan/rust-concurrent-hashmap.

For the two-lock FIFO case, you can definitely do this in Rust but you might need to mark it "unsafe". (I'm happy for someone to prove me wrong on this, though!) "Unsafe" tells Rust "look, I know what I'm doing here" and allows you to do things like operate on raw pointers and generally avoiding checks. However, since Rust is no longer checking these assumptions, you need to be very sure of them yourself!

On the plus side, the rest of your code (e.g. the code using your two-lock FIFO) need not know that an implementation is unsafe - the Rust compiler can still check all the assumptions in that code.

It's arguably even a little bit nicer in SaferCPlusPlus[1]:

    template<typename TVectorPointer>
    void write_foo(TVectorPointer vec_ptr) {
        (*vec_ptr)[3] += 1;
    }
    
    template<typename TVectorPointer>
    auto read_foo(TVectorPointer vec_ptr) {
        return (*vec_ptr)[7];
    }
    
    template<typename TVectorAccessRequester>
    void use_shared_vector(TVectorAccessRequester vec_ar) {
        {
            // obtain a non-const pointer to the vector from the "access requester"
            // blocking if necessary
            auto wl_ptr = vec_ar.writelock_ptr();

            write_foo(wl_ptr);

            // the pointer owns a lock on the vector so as long as the pointer
            // exists it is safe and valid to use
        }

        // obtaining and using a const pointer to the vector
        auto res1 = read_foo(vec_ar.readlock_ptr());
        
        // without a "lock pointer" obtained from an access requester, there is no way
        // to access, or even refer to the shared vector, so you can't access it in an
        // unsafe manner
    }
[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus#asynchronou...
But you can implement the same abstraction in C++. It bugs me when people tout library features as language features.
You can't implement it soundly.
Define "soundly". Follow the rules and it won't break. Will it break if someone memcpys some object internals or something? Sure. Just don't do that. The rules that make this abstraction robust in C++ are easy to follow, just like the rule that says "don't use unsafe" in Rust. Just like Rust, you can break the rule if you know what you're doing.
> Follow the rules and it won't break.

That's the problem here: you're on the honor system that everyone knows and follows every rule every time. Maybe you have a really top-notch team, great code review, etc. but can you say for a certainty that this will always be true, or that it's true of every bit of code you use?

Being able to prove that in advance, especially in more complicated scenarios, has a significant value from checking on every build, especially when you think of the many bugs which have been caused by maintenance code breaking some of the assumptions which the original authors had.

Virtually no large scale C++ app has successfully consistently followed the safety rules. It is too easy to, for example, store references to the mutex-protected object somewhere and have those references persist past the unlock operation, allowing unsynchronized access. References and raw pointers are created invisibly in C++ (example: "this"), unlike in Rust where "unsafe" is always explicit (and can be forbidden with a compiler switch or source annotation).
It’s an exactly same reasoning behind static & dynamic type system. “Don’t put null value to this function and it will not break. Just put object with these properties into this function and it will not break.” Static type system allow you to ENFORCE those contracts regarding what is the data. Same with this, Rust’s affine type system allows you to ENFORCE the contracts regarding HOW to use the data (i.e. if already be free, it couldn’t be used any more) as opposed to the infamous guideline “Don’t use the data after it’s already been freed “
You can't just wrap a statement the compiler has deemed to not pass the borrow checker and make it compile. Unsafe blocks can bypass the type system, but you have to call extra magic to do so.
So, really, what we're discussing is how much ceremony a language ought to require before enabling unsafe behavior. I think C++ provides enough ceremony that you can write decently safe programs in C++.
It's not about ceremony - it's disingenuous to say that C++ can enforce the same safety rules as Rust w.r.t. to lifetimes or that C++ is as safe as Rust is within it's unsafe blocks.

Whilst currently the borrow checker is a bit greedy, it's safe to say that most of the time the borrow checker is checking the code is safe the same way I try to reason about memory accesses in C and C++ when dealing with concurrency or freeing memory. The power of the borrow checker shows itself when reasoning about a local function and no longer requires you to reason about the whole application state as the compiler ensures locality to the extent that shared mutable state is not allowed.

That has empirically been proven false over a decade of C++ use. Additionally, C++ allows for plenty of memory safety problems without any syntactic ceremony.