Hacker News new | ask | show | jobs
by olliej 791 days ago
I know C++ the language but not the STL (the overwhelming abundance of UB and total lack of safety make it an anathema), so my question is why the STL allows/requires non-move here copying here dependent on whether an object has a no throw move constructor?

Note I’m not asking about move constructor vs memmove/cpy but rather the use of copy constructor vs move depending on exception behavior? Is it something like prefer no throw copy constructor over “throwing” move?

7 comments

That’s a bit like saying you know C++ but not streams or templates, or C but not floating point operations. It’s probably worth learning STL.

Anyway, the reason to use move instead of copy is for performance. Move constructors are faster because they can leave the source object modified (e.g., take over control of a pointer to deep contents). This falls apart when the move constructor can throw, because the container might be part way through a resize when this happens, leaving the object before the exception modified and the code in an unrecoverable state.

Basically, unless we can be super duper 100% certain we’re going to make it through the operation without throwing an exception, we’re going to copy, leaving the objects in question in an unaltered state, and holding to the promises of the standard.

I phrased that badly - it should have been “I don’t know every edge case in the STL, and so I don’t know why this would have different behavior”.

However thanks for explaining the issue. This one is obvious and I just completely failed to think about how you ensure the source object is in a safe state if an exception occurs part way through moving the source data. It seems to imply the old MSVC behaviour was incorrect in such a scenario, but I hadn’t considered that possibility so assumed it was correct and therefore didn’t think of why this behaviour is required.

My solution is of course to simply not allow exceptions because the c++ model of everything implicitly throwing is just as annoying as Java’s “let’s explicitly annotate everything” model albeit with different paths to sadness.

Fair enough. Honestly, there are very few people in the world who could confidently claim that they know all of the STL. The first place I worked at disallowed it (MSVC 5 was fresh then, so it was somewhat understandable), and we had our own performance centric data structures. But the value of container classes and promises about performance first really hit me when I went to my next job and dug in on the STL. Truly eye opening stuff, and absolutely available for reading (which was pretty cool at the time).
> leaving the object before the exception modified and the code in an unrecoverable state

It isn't likely to leave the code in an unrecoverable state even if recovery is calling std::terminate (or worse).

It is likely to leave the data in an unrecoverable state. Imagine that a vector of 4 items was resized -- the first two objects move successfully, but the third one throws an exception. Then in your move function, you catch that and decide to undo your changes before propagating the error. Then when you're undoing the changes the first object throws an exception when it's being moved back. Oof! At best you've got multiple active exceptions (legal if you're in the catch handler, but should be rare and definitely should be avoided) and at worst your data is indeed unrecoverable (thus one of many reasons why std::terminate is the default option when multiple exceptions are alive on the same stack).

Sure, but assumptions about the state of data are made in real world code. It’s just a mess, which is why the code in question needs to break into jail and very explicitly indicate that it isn’t going to fire off exceptions (and hold to it). Honestly, C++ move semantics and default behaviors could be its own very lengthy conversation. It’s why the bulk of my C++ code over the years has been explicit about references and copies (since most of my C++ code has been about high performance real time rendering or data analysis).
The overwhelming C++ priority beating both safety and performance, often to the consternation of the performance people, is backwards compatibility with dusty archaic code. If it was written by somebody whose funeral was last century, WG21 thinks it's important that it still compiles in your C++ 23 compiler whenever you get one of those. Not crucial. Not so that they actually defined the language sensibly to avoid compatibility problems, but important enough to trump mere performance or safety concerns.

Last century move didn't exist. The terrible C++ move (which is basically an actual "destructive move" plus a default create step) was invented for C++ 11 which was, as the name suggests, standardised only in 2011.

So back then everybody is using copy assignment semantics. Your compiler might be smart enough, especially for trivial cases, to spot the cheap way to deliver the required semantics, but it might not especially as things get tricky (e.g. a std::vector of std::list) and semantically it's definitely a copy, not a move.

As a result the "non-move" that you're astonished by is how all C++ code last century was written, the semantics you're just assuming as necessary didn't even exist in ISO C++ 98 and it is considered important that such code still works.

I think the other replies may have misunderstood your question. I think you are asking:

Why does std::vector<T> require T's move constructor to be noexcept (or else it falls back to copying instead)?

The reason goes something like this:

When std::vector<T> grows, it needs to move or copy all of its elements into a new, larger-capacity array. It would prefer to move them, since that's a lot more efficient than copying (for non-trivial types). But what happens if it moves N elements, and then the move constructor for element N+1 throws an exception? Elements 0-N have been moved away already, so the vector is no longer valid as-is. Should it try to move those elements back to the original array? But what if one of those moves fails?

The C++ standards body decided to sidestep this whole problem by saying that std::vector<T> will refuse to use T's move constructor unless it is declared noexcept, so the above problem can't happen.

In my opinion, this was a huge mistake. Intuitively, everyone expects that when an std::vector<T> grows, it's going to move the elements, not making a ton of copies. Often, these copies result in hidden performance problems. Arguably the author of this post is lucky than in their case, the copies resulted in outright failure, thus revealing the problem.

There seem to be two other possibilities:

* std::vector<T> could simply refuse to compile if the move constructor was not `noexcept`. I think this could have been done in a way that wouldn't have broken existing code, if it had been introduced before move constructors existed in the wild -- unfortunately, that ship has now sailed and this cannot be done now without breaking people.

* std::vector<T> could always use move constructors, even if they are not declared `noexcept`, and simply crash (std::terminate()) in the case that one actually throws. IMO this would be fine and is the best solution. Move constructors almost never actually throw in practice, regardless of whether they are declared as such, because move constructors are almost always just "copy pointer, null out the original". You don't put complex logic in your move constructor. And anyway, C++ already has plenty of precedent for turning poorly-timed exceptions into terminations; why not add another case? But I think it's unlikely the standards committee would change this now.

Honestly, trying to move the elements back and calling std::abort if that fails seems fine. It is indeed an exceptional happenstance, and how quickly you can recover from it is probably not as important as being able to recover correctly. And who catches exceptions around resize()/push_back() anyway?
Throw specifications do not change function call binding behavior.

Move constructor and move operator will bind to an R-value reference if the move constructor or move operators are available. Conversely, if those functions which declare to not throw anything do end up throwing something then the result is std::terminate.

The only things that determine whether to use a move or copy is whether the reference is an R-value and whether the source or destination is const.

You can declare a {const R-value move operator (not a constructor) for the left-hand} and/or {const R-value move operator or constructor for the right-hand side} of the argument. But you won't be able to modify anything not marked mutable. You shouldn't do that though: that's a sure way to summon nasal demons. That said, I see it fairly often from less experience engineers, particularly when copy-pasting a copy operator intending to modify it for a move operator.

What do you use instead of std::vector, map, unique_ptr, etc?

I have a hard time thinking of C++ and the STL as separate. Even our internal utilities and such tend to be STL-like although often with safer defaults.

Lots of the C++ standard library, including the STL containers isn't provided in freestanding C++. Now, in reality freestanding C++ has been kind of a joke - the committee for years barely bothered to keep it working - especially compared to freestanding C (which is well defined and used all over the place) and say Rust no_std (likewise) - and so many embedded systems may have the entire standard library notionally available even though parts of it are definitely nonsense for them and they've got local rules saying not to use the parts that would definitely explode in their environment... but many C++ programmers who have worked under such rules just reflexively avoid the STL's containers and maybe its algorithms even in an environment where those would work.
Essentially the same things but reimplemented safely - see WTF in webkit.

There are still issues (the iterator API used by for(:) is very hard to make safe without terrible perf issues, though I was looking at this recently and the compilers are doing much better than they used to).

Things like unique_ptr and shared_ptr do not meaningfully improve the security of c++ despite being presented as if they did (all serious c++ projects already had smart pointers before the stl finally built them in so presenting them as a security improvement is disingenuous), and because of the desire to have shared_ptr be noninvasive it’s strictly worse than most other shared ownership smart pointers I’ve used.

Yep, we've always had our own implementation of std::shared_ptr<> for this reason.

Either the reference is elsewhere (and now you have to dereference another area of memory occasionally which is the worst case for cache performance), or its alongside your object. If its alongside your object it's better to know it's there for padding, etc.

And it's easy to forget to allocate for the alongside case, so you can have hidden poor performance.

I have a blog post on the topic here: https://quuxplusone.github.io/blog/2022/08/26/vector-pessimi...

The TLDR is: Using `move_if_noexcept` instead of plain old `move` can help you provide the "strong exception guarantee." For what _that_ is, see cppreference: https://en.cppreference.com/w/cpp/language/exceptions#Except...

and/or the paper by Dave Abrahams that introduced the term, "Exception-Safety in Generic Components: Lessons Learned from Specifying Exception-Safety for the C++ Standard Library." https://www.boost.org/community/exception_safety.html

> Is it something like prefer no throw copy constructor over “throwing” move?

Almost. If move won't throw (or if copy isn't possible), we'll move. But given a choice between a throwing move and any kind of copy, we'll prefer copy, because copy is non-destructive of the original data: if something goes wrong, we can roll back to the original data. If the original data's been moved-from, we can't.

UB is a feature; people who keep on fighting it are such a pain.

Regarding your question, nothrow operations are essential to maintaining invariants. And maintaining invariants is how you make code correct in a world where UB exists.

Yes, if the programmer maintains certain invariants, the C's flavour of UB allows the compiler to take advantage of those invariants for performance gains, by omitting run-time checking for those invariants.

The problem with this flavour of UB being a programmer's promise "this is fine, trust me, no run-time checks needed" to the compiler is that a) it's made by the programmer by omitting said run-time checks ― and that often happens accidentally, not intentionally; b) the compilers are really bad at pointing out to the programmer places where they took advantage of such promises, which really complicates the task of writing conforming programs. Every time I add two int's, I promise to the compiler that an overflow won't happen: and of course, the moment an UB happens, all invariants cease to hold, so trying to find the initial bug where you've accidentally broke one of invariants turns into a nightmare.

A program's correctness goes way beyond memory safety, and is entirely at the mercy of thr programmer doing a good job.

This is true regardless of whether the language has undefined behaviour or not.