Hacker News new | ask | show | jobs
by MaulingMonkey 2 days ago
> Fair point (although to be honest: 'complexify' feels a bit of an exaggeration here to me)

Both uint8_t and std::byte require a header (<cstdint> or <cstddef>) which may expose you to platform x config specific build failures if you do any conditional #including, and the latter is a whole damn enum class with a strange adversion to arithmetic, where `byte |= 1` becomes `byte |= std::byte(1)`, `byte += 1` becomes `byte = std::byte(std::to_integer<std::uint8_t>(byte) + 1);`, and both become something you can accidentally step into in your full debug builds because it's an actual function call (at least on MSVC - still extra instructions on clang/gcc, but I can see the dang call instruction on MSVC!) instead of a compiler built in.

Not to mention, neither is vanilla C++03... I threw a `std::byte` example in a quick godbolt snippet and MSVC wouldn't compile without adding /std:c++17, because of course it defaults to earlier. Which is silly, but that's also the story of my life.

And don't get me wrong - that's all relatively minor - but it's all for middling to negative value IME. `void*` is frequently clearer - it's a signal that it's an opaque blob at this point in the code, and that something else will try to give it meaning later. I struggle to think of a single bug that I've encountered, that would've been caught by the compiler had I used `std::byte` over `unsigned char` or `void`. And conversely, I've seen APIs accepting `std::byte` but requiring higher alignment, where with `void` I might not have dropped my guard as much.

> `std::span`

At least manages to bind pointer and size into a single variable, which IME at least has the advantage of eliminating some bugs (e.g. mismatching pointers and sizes) and allowing some nifty utility functions to become a lot more wieldy. You can do things like feed it an array and not have to do any of your own `sizeof(...)` shenannigans. At this point you're possibly getting into positive expected value, but I'm going to eye roll at pull requests refactoring `void*` based stuff to use it unless I see at least one actual concrete example of calling code improving alongside it - I don't want just hypothetical theoretical ergonomics, I want actual concrete ergonomics!

2 comments

> `void*` is frequently clearer - it's a signal that it's an opaque blob at this point in the code, and that something else will try to give it meaning later.

And that's fine, until something else gives it the wrong meaning later. If you're just plumbing, and you're pumping around opaque blobs, if somewhere in the plumbing you connect the wrong source to the wrong destination, you get no warning.

Yes, in this case, void* is kind of smelly. If the intent of your function is to receive a const struct MyCustomData*, then that should be the type of the argument. If you later need to handle a const struct MyOtherCustomData*, you can add an overload that takes that argument. Or use a template as others pointed out. Use the type system to help you, so you're warned if you try to pass the method const struct BadCustomData* by accident.

If you truly don't know what the underlying structure of the "blob" of data is, sure, go ahead and use void* and explicitly convert the pointer type when you know what it is, but at least add a comment that you're entering the danger zone.

To be fair, the `void*` is already a pretty big hint that you're in the danger zone.
> if somewhere in the plumbing you connect the wrong source to the wrong destination, you get no warning.

A valid concern. I've been in the position of having to fix those bugs.

I'm not going to recommend `qsort` over `std::sort` or anything like that. I've seen `void*` "user data" pointers in C APIs and decided to stuff runtime checkable handle values in there instead of real pointers to blobs. In Rust, this can mean avoiding some unnecessary `unsafe { ... }` blocks, since while reading anything from a `*const c_void` requires such things, reading from a global `Mutex<HashMap<*const c_void, Arc<dyn Any>>>` or similar nonsense does not. I'll eat the performance hit unless I'm worrying about an actual hot path!

And I'm not going to stand in the way of newtype wrappers around void pointers either. I'll +1 the PRs with `class ASpecificKindOfBlob { void* data; size_t length; };` and suchlike as well, if `std::span`s aren't your type of thing, and abide by measures to centralize such plumbing in one place such that the opportunities to make a mistake are fewer.

But sometimes a blob is just a blob, and breaking out `std::byte` is putting lipstick on a pig. And it's not even the right color lipstick.

> and the latter is a whole damn enum class with a strange adversion to arithmetic, where `byte |= 1` becomes `byte |= std::byte(1)`, `byte += 1` becomes `byte = std::byte(std::to_integer<std::uint8_t>(byte) + 1);`, and both become something you can accidentally step into in your full debug builds because it's an actual function call (at least on MSVC - still extra instructions on clang/gcc, but I can see the dang call instruction on MSVC!) instead of a compiler built in.

That's the point, std::byte is for opaque bytes. You're not expected to do arithmetic directly just like you can't do arithmetic on void.

> That's the point, std::byte is for opaque bytes. You're not expected to do arithmetic directly just like you can't do arithmetic on void.

That'd make a whole lot more sense to me if `std::byte` didn't explicitly implement a whole bunch of bit twiddling operations:

    << >> |= &= ^= | & ^ ~
This isn't opaque. This is a poor man's half-assed `std::bitset<8>` with a guaranteed `sizeof(std::byte) == 1`.