Hacker News new | ask | show | jobs
by rwbhn 2326 days ago
Discussion of whether the next (or any future) c++ release should break abi compatibility.
1 comments

Does C++ have an ABI?
Every implementation implicitly defines an ABI due to the size and layout of classes/structs defined in the STL.

Here's a simple example. Suppose we define std::string with the following layout (for simplicity I'm removing the template stuff, SSO, etc.):

  class string {
   public:
    // various methods here...

    size_t size() const { return len_; }
 
   private:
    char *data_;
    size_t len_;
    size_t capacity_;
  };
When a user calls .size() on a string, the compiler will emit some inlined instructions that access the len_ field at offset +8 bytes into the class (assuming 64-bit system).

Now suppose we modify our implementation of std::string, and we want to change the order of the len_ and capacity_ fields, so the new order is: data_, capacity_, len_. If an executable or library links against the STL and isn't recompiled, it will have inlined instructions that are now reading the wrong field (capacity_).

This is what we mean by the C++ ABI. This is a simple example, but there are a lot of other changes that can break ABI this way.

That's not exactly the same. What you're referring to is the library ABI for the c++ standard library. Every library which can be linked to dynamically has its own ABI. The language ABI, on the other hand, describes how every library's ABI is defined, describing things like layout and name mangling. So if, for instance, the language ABI were amended to say that class members are arranged in alphabetical order in memory, then capacity_ will always be at offset 0, data_ at offset 8, and len_ at offset 16; if you change the order of declarations then, the library's ABI won't change. But if the library were compiled with an old compiler that targeted the old ABI, then it would put data_ first, followed by len_, then capacity_. So if you then compiled a new piece of code with a new compiler targeting the new ABI, but linked against it the library, there would be a mismatch.
It doesn't have a standard ABI.

Nevertheless, certain language changes can force a breaking change to any existing ABI (or even all of them, and the C++ committee does not work in a vacuum. They work with existing implementations and must agree with implementers before making changes to the standard.

For example, there was a change to the definition of std::string in C++11 that forced a break in all commonly used ABIs (MSVC, Itanium at least). This was deemed necessary, but the cost of it to real-world programs has proven higher than anticipated, and may be a regretted decision (it apparently still causes problems and requires special flags even today).

C++ doesn't but Windows C++ does, to the extent it's implicitly used for things like COM. Any breaking change would have to be managed carefully.
On the contrary one of the reason of COM is to not depend on the C++ ABI, which is not stable at all under Windows (it has been de-facto stable for the three last version of MSVC, but was broken each time before, and the recent stable stride is not an indication this ABI compat will continue - actually it is well known that MS internally maintains an ABI incompat version of the STL that will be very probably used in the future, to fix some issues and optimize things)

On another platform, libstdc++ is mostly backward compatible, within reason.

The C++ standard is not "officially" concerned by stability, except that in practice people in the committee care a lot (because some major implementations care a lot) so some modifications are rejected because they would break the ABI currently used in practice.

In regards to COM I was specially thinking about how virtual functions and methods are layed out. This cannot change without breaking a lot of code. Already this causes issues in non-C++ languages (e.g. the difference with thiscall in C++).
The vtable layout is much smaller and much more stable than C++ as a whole. COM has a ton of value.
And it got even better with the UWP changes.
You don't get a stable ABI by accident. MS chose to maintain stability on these recent releases. This is a change from past practice, in response to customer needs.

Customers who needed stability were staying on ancient compilers. MS probably would rather have them using new versions, and exercising new features.

No, but it has a large number of implementing ABIs subject to complicated requirements. Both those explicit requirements upon ABIs and individual implementer decisions create legacy that conflicts with things that could make the language better.
All platforms have a standard ABI. Windows' (more specifically, MSVC's, as mingw g++ does not follow it) is mostly undocumented, but substantial portions are reverse-engineered. Most other platforms use some modification of the Itanium ABI, which describes the ABI in terms of a C structs and functions. ARM uses Itanium, with a somewhat different mechanism for exception handling.
Not one described by the C++ Standard. Neither does C.