Hacker News new | ask | show | jobs
by kstenerud 11 days ago
The real question here is: WHY are you passing a blob of memory rather than a struct that uses the type system to describe and enforce what the contents are?

I don't mean dressing up an anonymous pointer, which the author rightly complains about. I mean WHY are you making an API that takes such a pointer to an unknown type to begin with? Whenever you change the structure within that blob, your type checker won't flag that the receiver hasn't been updated to handle it.

Even worse: nothing's stopping you from accidentally passing in the wrong type.

And now you have a SEGV. Or a security hole.

4 comments

> Whenever you change the structure within that blob, your type checker won't flag that the receiver hasn't been updated to handle it.

The relevant type is "blob". There is no further structure. If the function that accepts void* is trying to extract structure out of the blob, there is a bug in that function and the type checker should already catch you trying to extract structure from something that isn't there.

> I mean WHY are you making an API that takes such a pointer to an unknown type to begin with?

It's not unknown in any meaningful sense. It is known to be a sequence of 'arbitray' datums of a given length, which is the exact type of input required for the scenario given.

As the article explores, some argue that you should define that sequence with a concrete type, but the article states that it doesn't offer any additional value as is posits that void* already communicates the same. In other words, it suggests that void* is the concrete type for that type already.

For type erasure (it’s sometimes useful), custom allocators, I/O, for example.
Then pass an uint8_t with size aka a span<uint8_t>.
> The real question here is: WHY are you passing a blob of memory rather than a struct that uses the type system to describe and enforce what the contents are?

I completely agree. It's particularly egregious when the blogger complains that the complexity and ugliness lies in the type casting to force an incompatible type where it doesn't belong, and use a reinterpret_cast of all things.

This doesn't even feel like a strawman argument anymore. This sounds like a coding horrors entry.

I think the article names hashing as a use-case, which I can somewhat still agree. Operations that only depend on the bytes, I guess. But yeah, most things worth saying about this article have been said here already
Sure, if the function is expected to not treat the data as anything but bytes, then it might be acceptable in narrow circumstances.

But in such a case I'd argue FOR the ceremony, as a way of declaring from the API "The input is a sequence of bytes that I won't treat as anything other than a sequence of bytes", and declaring from each and every call site: "This is not a mistake; we really are 'converting' this struct to a series of bytes for this function to consume".

Then anyone auditing the code knows the intent by the shape of the types, and would quickly flag any typecasting shenanigans within the receiver function.

But even then, hashing a struct will rapidly bring you into the land of dragons and fairies. Abandon all hope if you have floats or UTF-8 (which have multiple representations for the same values).

Far better to remain type-aware if you value your sanity.

I agree, the original article is rather questionable. I do not write code like the article advocates for. I would probably go for overloads for each data type I have considered and tested, or maybe something fully templated, or std::span/boost::span (hash function is, interesting enough, the very example boost docs give to illustrate boost::span).
A more immediate concern for hashing by treating a struct as a bag of bytes is padding.
Hashing everything based on the byte representation breaks when you have a type where equality does not imply byte equality. Such as… floats (+0 and -0 are equal, but have different byte representation).
Depends on the use-case, hashing can also be used for checking integrity/change in which case you exactly want the behavior that only bit-exact-equality is desired, even for arbitrary structs. Maybe that's somewhat niche, I mention it as I have such a use-case actually.
Even then, accepting a uint8_t* would make this intent clearer.