Hacker News new | ask | show | jobs
by gignico 2 days ago
> It seems that some people are really losing the taste for good readable code.

It seems that some people never had taste for good reliable code. Use `void ` and now any error whatsoever is a direct undefined behavior. Moreover `std::span` clearly says that you are not* taking ownership of the memory (even though the language does not check it of course), while `void *` does not.

I understand that people can have many things to say about C++, and I do as well, but `std::span` should have been there decades ago and is such a life saver in these situations. A truly zero-cost abstraction which effectively saves you from a lot of troubles.

5 comments

There's lots of UB in C-family execution models. Some of which is not actually UB because the implementation defines it - e.g. aligned DWORD-sized memory access is atomic on Windows because Microsoft said it is.

By choosing to use this language you choose to navigate the UB. Otherwise you'd be writing in Go, or Python.

It is possible to write reliable code despite the presence of UB in a language just like it's possible to drive to work every day for 20 years despite most of the directions you can point the car leading to an immediate crash. That's a needle with a much thinner eye than UB in C, and most people manage it. Mainly it means being very careful about lifetime and ownership. The Linux kernel manages it 99% of the time simply by being careful about lifetime and ownership, and that's a project with a huge number of contributors who don't intimately know each other's modules. I'm the Linux kernel you can't just say "new whatever" - you must have a plan for a lifetime of that whatever, and other people will review it.

I agree with you about std::span.

Yeah but also, quick question:

  struct S {
      char c;
      int i;
  };

  struct S a = {0};
  struct S b = {0};

  memcmp(&a, &b, sizeof(a)) == ...
If you answered 0, you'd be wrong, the answer is undefined, thanks to padding, initialization and alignment rules. Padding bytes are undefined, and not guaranteed to be initialized to zero even if the variable is declared static (where the members would be zeroed).

This is why the compiler is angry at the post writer, and why the reinterpret_cast is needed. Ideally if they wanted to do something with the data, they'd unbox the structure.

That's why it's not a good idea to use void* to pass arbitrary data interchangeable with bytes. It's a location, it makes no representation as to what's there and how to interact with it. Let alone who owns it.

std::span solves two problems here. One is the ownership problem. The other is that span<T> is a T[]. void* is god only knows.

The post asserts:

> The code is very clear and straightforward: you pass a pointer to the custom data structure, and its size in bytes. That’s it. Simple and clear.

This is unfortunately entirely false in C thanks to the aforementioned alignment/padding UB (and of course inner pointers). This is addressed with std::span. You'd still have to reinterpret_cast your structure to get the UB.

> Why should people complexify and uglify their C++ code with the uint8_t pointer (or std::byte), when void* works just fine??

tl;dr: because it doesn't. It just kinda looks like it does if you squint, and it's going to lead to the gnarliest bugs in the world.

> even if the variable is declared static

No, for static even padding bytes are zero.

For automatic, yes it may effectively turn a = {} to a.member = 0, leaving the padding bytes uninitialised. Or on copies like a = b it may not copy padding bytes.

Padding bytes are initialized to zero if you zero initialize the aggregate. It is hard to keep those bytes as zero but at initialization this much is guaranteed.
I looked into it some more and it's actually worse.

For static or thread storage, in C11 and later, ={0} will guarantee padding is zeroed. For automatic storage, per C11 6.7.9, only subobjects are required to be zeroed. Padding is not. [1]

In C23 initializing with ={} will give you zeroed padding, initializing with ={0} will not.

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf

> Some of which is not actually UB because the implementation defines it

No - if something is UB in the spec, it's UB. The implementation will do something, sure, but what it does is not fixed and may even change based on compiler version and optimization level.

> DWORD-sized memory access is atomic on Windows because Microsoft said it is

Well, Intel said it is. Mind you I don't think there are any 32-bit native architectures where aligned dword access isn't atomic. Unaligned, on the other hand ...

"Undefined behavior" in the C standard literally means "behavior which this C standard does not put any requirements on" - it says so in the definitions section of the C standard. Other things can still put requirements on it. MSVC isn't just a C++ compiler - it's a C++ compiler for x64 Windows and therefore follows the rules of C++, x64, and Windows all at once.
> No - if something is UB in the spec, it's UB.

A compiler is still free to ignore the spec and declare that something is not UB. However, this is very much compiler based, not platform based. Windows might guarantee that aligned DWORD-sized memory accesses are atomic, but that doesn't mean Clang when compiling for Windows would respect this - but MSVC might.

No, a compiler obviously cannot do this. nothing is undefined behaviour under a known compiler, version, and settings. UB means you can't know what the code does in general not that you can't know what it does in a very specific case.
UB has 2 very different implications:

1. It means that even if your program happens to work, it can't be portable

2. It means that even if your program happens to work today, it might stop working tomorrow when you add some new code, when you change some compiler flags, or when you do even a minor compiler upgrade

Of course, a compiler can't address 1. However, a compiler can very much address item 2. If Microsoft were to say "in MSVC, we define integer overflow to wrap", then they would guarantee that `INT_MAX + 1` will produce `INT_MIN` regardless of any optimization settings, any compiler upgrades, any other changes to the code. Of course, compiling the exact same program with Clang or GCC might cause it to crash or corrupt memory or anything else - but as long as you stuck with MSVC, your program would have perfectly defined semantics.

This is similar to using compiler extensions or intrinsics - they are not portable and not defined by the standard, maybe even explicitly defined to NOT be supported per the standard (such as variable length arrays in C++ in GCC), but they are nevertheless perfectly safe as long as you stick to your chosen compiler.

Edit to add: the integer overflow example is not just a theoretical possibility - lots of C++ compilers provide the `-fwrapv` flag; when using that flag, signed integer overflow is no longer UB for that program, it is defined just the same as unsigned integer overflow.

There is a difference between UB in C, and something being undefined in some version of Microsoft C on Windows.

Many of C's UB is specifically, intentionally left undefined in the standard to express code that relies on some specific way it is handled, is not proper, portable C. Indeed, the DWORD-sized memory access being atomic doesn't apply to MS Windows prior to version 3.0 running on a 80286.

It's UB because the ISO C spec says it's UB.

That is quite common among C developer culture, play loose and brace for impact.
> A truly zero-cost abstraction

Sadly the MSVC ABI makes std::span and std::string_view a pessimisation:

https://github.com/tringi/win64_abi_call_overhead_benchmark

https://godbolt.org/z/7baaox7re

Sounds like a compiler bug to me. It is a valid reason to avoid them in some rare cases right now, but it doesn't make the feature itself bad
Those are ABI. Unless it is inlining them, the overhead is to stay.
ABI changes do happen. gcc had an ABI change in std::string because of C++11. It was long and painful, but everyone survived, the world did not end
> ABI changes do happen

Will never happen on Windows, especially not in user-mode libraries, and especially not something this pervasive.

Contrary to the FOSS compile from source culture, other platforms have a different point of view on ABI breaks.

Which is why Valve ended up using Proton.

I'm pretty sure GCC has been ABI stable far longer that MSVC which used to break ABI every release.

GCC was forced to break the std::string ABI by the C++11 standard and they have been lobbing ever since against ABI breaks.

> but `std::span` should have been there decades ago

Absolutely! I now use it consistently in all new projects where I can afford to mandate C++20. I guess nobody bothered to make a proposal before...

They did in C, from one of the language authors even, and it was not accepted.

https://www.nokia.com/bell-labs/about/dennis-m-ritchie/varar...

By the way, both Extended Pascal, Mesa/Cedar and Modula-2 have them, under the name of open arrays.

Basically it took Go, C# and others for C++ to finally get its span.

C probably never will.

Everybody knows that C++ did not invent the concept of spans and that it was late to the party. It doesn’t change the fact that (presumably) nobody made a proposal to the C++ standard.
> It doesn’t change the fact that (presumably) nobody made a proposal to the C++ standard.

There were proposals about this for many years. C++ is just a terrible programming language, standardized by a committee (WG21) which exists in large part to boost the ego of one man, Bjarne Stroustrup.

N3851 for example wants to name this idea "array_view" which like "string_view" is an impressively unwieldy name for a core language feature, because of course neither of these were actually proposed as core language features even though that's what they naturally should be -- but it is basically the slice type or as you (and modern C++) call it a "span".

It's true that you can't change facts but what you've got here was a belief which was unfounded, not a fact.

> There were proposals about this for many years.

I wrote "presumably", but you are 100% correct. I'm always happy to be proven wrong.

N3851 actually deals with multi-dimensional spans and goes way beyond a simple slice/span type. To me it seems closer to std::mdspan than std::span.

The earliest proposal I could find that does propose something similar to std::span dates back to 2012: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n33...

I really don't understand why this was not pursued further. At the very least, this should have made it into C++17 together with std::string_view.

> because of course neither of these were actually proposed as core language features even though that's what they naturally should be

Should it really? What would this even look like in C++? IMO std::span works perfectly fine as a library type.

> C++ is just a terrible programming language, standardized by a committee (WG21) which exists in large part to boost the ego of one man, Bjarne Stroustrup.

That's certainly not the reason why it was standardized. Pre-C++98 was wild west with every compiler offering there own (incompatible) idea of what C++ is. Yes, there are many problems with design by committee in general (and the C++ committee in particular), but there was a very good reason for standardizing the language. The committee is not a one man show and there are many occasions where Bjarne has publicly voiced his frustration and disagreement.

> The committee is not a one man show

Of course it isn't, all the great egotists need a parade of sycophants to heap praise on them, you've doubtless seen modern US "Cabinet meetings" in which TV hosts newly elevated to run parts of the US government compete with experienced politicians as they all try to offer the most effusive praise for their snoring God King.

Personally, I'd throw up, but then I'm very much of Groucho Marx's view on such things.

Microsoft made the proposal for C++, after Midori project, and Office security improvements.

Which by your comment, you have no clue about how it came to be.

Proposal is linked in another comment of mine.

Well, you could have linked an actual proposal instead of dropping some cool facts about C, Extended Pascal, Mesa/Cedar and Modula-2, as if that explained anything.
> I understand that people can have many things to say about C++, and I do as well, but `std::span` should have been there decades ago (...)

Decades is kind of a stretch. C++11 introduced smart pointers, and finally getting C++0x out of the door was already a major victory. Given the history of C++, it would be unrealistic to introduce something like std::span before C++17.

Meantime, some organizations are still struggling to migrate to something like C++14.

It could have been there since the beginning, given that open arrays (aka spans) already existed in other languages, and there was even a failed proposal from Denis Ritchie regarding C.

The C++ span proposal came from Microsoft,

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p01...

> already existed in other languages

This argument is moot. The issue with spans is not that they require cutting edge technology to deliver.

Before commenting, perhaps you should research why even Denis Ritchie himself could not sell his idea to C.

It's funny how every single idea that's rejected is blindly lauded as brilliant but silenced due to some kind of conspiracy, and only the ideas that emerged are somehow bad, unacceptable, or late. Is the point to feel outraged?

Easy, even one of the author's could not change WG14 mind towards security.

Governments,related cybersecurity agencies, and companies are the ones getting outraged when looking at money spent in cyber attacks due to memory corruption issues.

WG14 adopted variably modified types, a kind of dependent type. From a security standpoint it offers all the same qualities. It also in principle was easier to integrate from a backwards compatibility standpoint, with the exception of struct member analogs (which we now have but aren't yet standardized).

Maybe we would have been better off with Ritchie's counter proposal. But neither proposal was chiefly concerned with security, thus no proposals for, e.g., automatic bounds checking.

Just to be clear, I often think we would have been better off with Ritchie's proposal, assuming it would have seen at least as much adoption in implementations and usage as variably modified types, which sadly remained poor for many years after C99, and arguably still poor. But being better off doesn't mean being in a drastically better situation than we are today from a security perspective. The proposed alternatives were prerequisites for substantively improving security, but far from sufficient. And the delay in adopting and refining variably-modified types has cost much more than whatever marginal benefit Ritchie's proposal offered. Ditto for other gaps, like better facilities for handling arithmetic, e.g. overflow and mixed type comparisons. The first step in addressing overflow only came with C23 (overflow checking routines), and the latter only in the forthcoming C2y (typesafe, mixed-signedness min/max, etc).
> Easy, even one of the author's could not change WG14 mind towards security.

Your comment conveys a hefty dose of ignorance on the topic. I recommend you read the proposal's arguments, including how it required breaking the ABI.

Are you asserting that WG14 never had the necessary skills among all the members to help improve this proposal, or dare to bring another one during the last 40 years?
Afaik std::span does not need anything that was not in C++98 already, or am I missing something?
> Afaik std::span does not need anything that was not in C++98 already, or am I missing something?

You're missing the fact that following C++98 it took around 13 years to get the next version of the standard published delivered.