Hacker News new | ask | show | jobs
by mrtracy 775 days ago
The C++ smart pointers dont prevent multiple threads from mutating the pointed-to data at the same time; multiple threads can both access a unique_ptr at the same time and mutate its contents. Rust requires shared pointers (Arc) to also explicitly implement some sort of Mutex-equivalent runtime safety check in order to mutate the data. Rust also has explicit notion of thread ownership, and whether individual types are safe to pass to different threads; if a construct is not thread safe, Rust will prevent you from using it in multiple threads.

As a benefit of the thread-safety notion, Rust can have two reference-counting pointer types: Arc, which uses atomic reference counting and is roughly equivalent to std::shared_ptr, and Rc which does not use atomics. Rc cannot be used across multiple threads at the same time, and the borrow checker will prevent you from doing this.

Rc is appropriate for data structures which internally benefit from multiple pointers (e.g. graphs) but where all of that information is internal to a single data structure - this becomes available without paying the price of atomics.

2 comments

Thanks.

So basically Rust is combining object ownership and thread safety while C++ keeps thread safety separate, which would seem to provide more flexibility, but also lets you shoot yourself in the foot.

Just thinking out loud, I wonder if C++ could better address this by also having a class of thread-aware smart pointers? -- but the problem is that C++ always has the old/new (C, C++) way of doing things - pthreads vs std::thread, std::mutex, etc, so even if the language provides easier ways of writing bug free code, there is no way to force developers to use those facilities.

In C++ there is also the issue of how to make statically allocated data structures thread safe in an enforceable way. Another kind of smart reference object, perhaps? Disallow global objects not accessed by such references?

C++ (which I have used since long before C++11) really wants to be two conflicting things - encompassing C's low level role as the ultimate systems programming language with no guardrails, while also wanting to compete as a much higher-level safer language for application developers. Perhaps the two safe+unsafe roles can be better combined into one language if one were to start from scratch. I'm not sure that Rust gets it right either - erring in the other direction by not being flexible enough.

In what ways do you think Rust is not flexible enough?

I ask because I can think of a few ways it’s less flexible than C, but I also think that effect is massively overstated by people who aren’t familiar with the language. There are OS kernels written in Rust, for example.

From what I've read it seems that certain types of data structure (incl. anything with potentially circular references) are difficult to write in Rust - you are more fighting the language than it helping you. I'm really comparing to C++ rather than C (where of course anything is possible, as long as you DIY).
Yes, data structures with cyclic references are a bit harder to write in Rust than in C or C++. But it’s not impossible. And IMO, you write those so rarely that it really doesn’t matter.
So ARC is something like the following?

    template <typename T>
    struct Locker {
        using M = std::shared_mutex;
        struct Locked {
            Locked(mtx, value) : m_lock(mtx), m_value(value) {}
            // operator->, operator*, get, etc.
        private:
            std::lock_guard<M> m_lock;
            std::shared_ptr<T> m_value;
        };

        struct Shared {
            Shared(mtx, value) : m_lock(mtx), m_value(value) {}
            // operator->, operator*, get, etc.
        private:
            std::shared_lock<M> m_lock;
            std::shared_ptr<const T> m_value;
        };
    
        Shared shared() { return Shared{m_mutex, m_value}; }
        Locked locked() { return Locked{m_mutex, m_value}; }

        // a nice forwarding ctor that prevents null m_value

    private:
        std::shared_ptr<T> m_value;
        M m_mutex;
    };
Rust `Box` = C++ `std::unique_ptr`, both have the same ABI (just pointers)

Rust `Arc` = C++ `std::shared_ptr`

Rust `Rc` = C++ `std::shared_ptr` but using a simple integer instead of an atomic so it is not thread safe

`Arc` and `Rc` do not allow you to mutate their contents directly so instead you should use "interior mutability" using something like a `Mutex` (thread-safe) or `RefCell` (not thread-safe), which have runtime checks to ensure no undefined behaviour is introduced. So `Arc<Mutex<T>>` makes it possible to mutate `T`, but `Arc<T>` cannot. Some types like atomics do not require mutability at all, so an `Arc<AtomicBool>` can be mutated directly.

An example of a big C++ codebase using something similar is Chromium, where `std::shared_ptr` is forbidden and `base::RefCounted` (Rust `Rc`) and `base::RefCountedThreadSafe` (Rust `Arc`) should be used instead. WebKit does this too.

> both have the same ABI (just pointers)

This is not actually true, but it's close enough for your purposes here.

But just to be clear about it, see stuff like this: https://stackoverflow.com/questions/58339165/why-can-a-t-be-...

Another reason it is not true: Rust has fat pointers, eg. `std::unique_ptr<const uint8[]>` and `Box<[u8]>` both contain the same allocation data, but `Box` will be 128-bit on 64-bit systems.
What's the utility of having a 128-bit pointer on a 64-bit system ?
`Box<[u8]>` stores the pointer and its length (2 x size_t), `std::unique_ptr<const uint8_t[]>` only stores the pointer.

That's for slices, for dynamically sized types (eg. `Box<dyn ToString>`) it contains a pointer to the virtual table.

https://doc.rust-lang.org/nomicon/exotic-sizes.html

Where can I find details like this about Rust?
The Rustonomicon is a good start, on fat pointers: https://doc.rust-lang.org/nomicon/exotic-sizes.html

> Because they lack a statically known size, these types can only exist behind a pointer. Any pointer to a DST consequently becomes a wide pointer consisting of the pointer and the information that "completes" them (more on this below).

You say:

> Rust `Arc` = C++ `std::shared_ptr`

GP says:

> Rust requires shared pointers (Arc) to also explicitly implement some sort of Mutex-equivalent runtime safety check in order to mutate the data.

Which is it?

> An example of a big C++ codebase using something similar is Chromium ...

Chromium's smart pointers are similar to their standard counterparts -- no mutexes for write access to pointed data.

Also, tangent but interesting: From https://www.chromium.org/developers/smart-pointer-guidelines...:

> Reference-counted objects make it difficult to understand ownership and destruction order, especially when multiple threads are involved. There is almost always another way to design your object hierarchy to avoid refcounting

Both are true, Rust just has more restrictions. It’s not completely equivalent, but you can think of `Arc<T>` as `std::shared_ptr<const T>` as in if you use `unsafe` or `const_cast` you can bypass mutability restrictions. Otherwise to mutate you need another abstraction doing `unsafe` things for you, such as `Mutex`.

I mentioned Chromium because they also differentiate between thread safe and non-thread safe shared pointers.

If anything, Rust shared pointers are more similar to C++ std pointers because in Chromium the reference count is inside the class, which is very handy because you can reconstruct a smart pointer from a raw pointer (like `this`), at the cost of needing `T` to extend `base::RefCounted`.

Perhaps I am not making myself clear here:

- RefCounted: It's like shared_ptr but refcount load/modify/store operation is not atomic, thus not thread-safe. No synchronization for pointed data.

- RefCountedThreadSafe: It's like shared_ptr. This means refcount load/modify/store is atomic, so has overhead, yet safe to pass across thread boundaries. Again, just like shared_ptr, no synchronization for pointed data.

- Locker class above: It's an (incomplete) wrapper around shared_ptr where read-only access goes through a shared lock and rw access goes through an exclusive lock. I suppose this is what rust's ARC guarantees at compile-time with less overhead the sketch above?

So;

> Both are true, Rust just has more restrictions.

No, both are not true, my understanding of ARC ~= Locker && ARC > shared_ptr

I think that's where you're confused: `Arc` does not do any synchronization, again it's pretty much the same as `std::shared_ptr` (hence the name Arc: Atomically Reference Counted).

Your `Locker` does not do what `Arc` does, even at compile time, because it does not allow concurrent access, like an `Arc<AtomicBool>` would. Your `Locker` is more like an `Arc<RwLock<T>>`.

Best equivalent you can get in C++ is `Arc<T>` = `std::shared_ptr<const T>`.

https://doc.rust-lang.org/std/sync/struct.Arc.html

> Shared references in Rust disallow mutation by default, and Arc is no exception: you cannot generally obtain a mutable reference to something inside an Arc. If you need to mutate through an Arc, use Mutex, RwLock, or one of the Atomic types.

I guess you could get the final pieces to get something similar by creating `Send` and `Sync` traits in C++: https://doc.rust-lang.org/nomicon/send-and-sync.html. I think the main pain point here is that you cannot auto-derive `Send` and `Sync` so it would end up being very verbose.

FWIW, in C++11, a class C can similarly cooperate to enable reconstructing a shared_ptr from a raw one by deriving from std::enable_shared_from_this<C>.
No, Arc doesn’t require a mutex if you don’t plan on mutating the underlying value.