The ABI discussed here is a bit higher level than that. Even with a fixed ABI for vtables, calling conventions, etc. you still have to care about what happens when you change a "vocabulary type" in the standard library- some changes are source-compatible but binary-breaking.
For example, C++11 broke ABI at this level by changing the representation of std::string.
Oof I was unaware of the string ABI break (just had to look it up) - that’s kind of gross :-/
That said in general ABI compatibility is expected of all changes, without very good reasons - security seems like a good one, performance alas not - I assume std::string got the small string optimization because they were already going to break ABI for the threading issues.
Of course it doesn’t help that c++ is still C like so implementations/use of data layout gets compiled directly into the client application :-/
Am I wrong in thinking that this mild form of schizophrenia ( no official ABI but dont break the ABI) is part of the reason why we now live in a world where all applications talk to each-other through sockets, incurring huge overheads?
Have you actually measured? The overhead isn't the socket, but the marshaling and unmarshaling, and you need these things for sockets, shared memory, or any other IPC.
You don't need to (un)marshal necessarily; stuff like strings, or arrays of integers, can go straight across. And if you have to pass large amounts of data, they'll probably take a form that's something like that.
The standard would probably need to introduce destructive move semantics and trivially relocatable types (unique_ptr for example) to enable simplifying the ABI for such types. Then of course an ABI break is required to actually apply the changes.
I don't know good ways to replace unique_ptr here.
Requiring library users to #include the concrete implementation is not good: inflates compilation time, pollutes namespaces, pollutes IDE's autocompletion DB.
Can go C-style i.e. pass double pointer argument to the factory, or return raw pointer. But then user needs to remember to destroy the object.
It's the overhead VS. passing a raw pointer. The itanium ABI says that std::unique_ptr has to be passed by address due to its special member functions (the ABI doesn't know if it stores a pointer to itself).
Compilers have an attribute to remove this overhead, but it's an ABI break to do it.
When a user calls .size() on a string, the compiler will emit some inlined instructions that access the len_ field at offset +8 bytes into the class (assuming 64-bit system).
Now suppose we modify our implementation of std::string, and we want to change the order of the len_ and capacity_ fields, so the new order is: data_, capacity_, len_. If an executable or library links against the STL and isn't recompiled, it will have inlined instructions that are now reading the wrong field (capacity_).
This is what we mean by the C++ ABI. This is a simple example, but there are a lot of other changes that can break ABI this way.
That's not exactly the same. What you're referring to is the library ABI for the c++ standard library. Every library which can be linked to dynamically has its own ABI. The language ABI, on the other hand, describes how every library's ABI is defined, describing things like layout and name mangling. So if, for instance, the language ABI were amended to say that class members are arranged in alphabetical order in memory, then capacity_ will always be at offset 0, data_ at offset 8, and len_ at offset 16; if you change the order of declarations then, the library's ABI won't change. But if the library were compiled with an old compiler that targeted the old ABI, then it would put data_ first, followed by len_, then capacity_. So if you then compiled a new piece of code with a new compiler targeting the new ABI, but linked against it the library, there would be a mismatch.
Nevertheless, certain language changes can force a breaking change to any existing ABI (or even all of them, and the C++ committee does not work in a vacuum. They work with existing implementations and must agree with implementers before making changes to the standard.
For example, there was a change to the definition of std::string in C++11 that forced a break in all commonly used ABIs (MSVC, Itanium at least). This was deemed necessary, but the cost of it to real-world programs has proven higher than anticipated, and may be a regretted decision (it apparently still causes problems and requires special flags even today).
On the contrary one of the reason of COM is to not depend on the C++ ABI, which is not stable at all under Windows (it has been de-facto stable for the three last version of MSVC, but was broken each time before, and the recent stable stride is not an indication this ABI compat will continue - actually it is well known that MS internally maintains an ABI incompat version of the STL that will be very probably used in the future, to fix some issues and optimize things)
On another platform, libstdc++ is mostly backward compatible, within reason.
The C++ standard is not "officially" concerned by stability, except that in practice people in the committee care a lot (because some major implementations care a lot) so some modifications are rejected because they would break the ABI currently used in practice.
In regards to COM I was specially thinking about how virtual functions and methods are layed out. This cannot change without breaking a lot of code. Already this causes issues in non-C++ languages (e.g. the difference with thiscall in C++).
You don't get a stable ABI by accident. MS chose to maintain stability on these recent releases. This is a change from past practice, in response to customer needs.
Customers who needed stability were staying on ancient compilers. MS probably would rather have them using new versions, and exercising new features.
No, but it has a large number of implementing ABIs subject to complicated requirements. Both those explicit requirements upon ABIs and individual implementer decisions create legacy that conflicts with things that could make the language better.
All platforms have a standard ABI. Windows' (more specifically, MSVC's, as mingw g++ does not follow it) is mostly undocumented, but substantial portions are reverse-engineered. Most other platforms use some modification of the Itanium ABI, which describes the ABI in terms of a C structs and functions. ARM uses Itanium, with a somewhat different mechanism for exception handling.
a. Having a real binary compatibility story is beneath C++, but
b. The accidental ABI compatibility that exists today is too widely adopted to break.