Hacker News new | ask | show | jobs
by simcop2387 2811 days ago
Not quite, (void *) is special in this regard IIRC, specifically for cases like this. It's a valid pointer but you're required to ensure that any platform requirements like alignment and size will match the requirements of what you're doing.
3 comments

No, (void⁎) is not special in this way. It just makes things look nicer.

The code also isn't undefined behavior... but you are really asking to hit compiler bugs! This is an easy way to confuse gcc into wrongly determining that the code has undefined behavior, and if gcc gets confused then it may determine that a code path can't be taken. Code paths that can't be taken may be deleted.

The main rule here is that memory has a type which is determined by what was last written into it, and you may only read or examine the memory using that type. (for the type, we ignore attributes like the distinction between signed and unsigned) There is a minor exception that is just enough to implement something like memcpy by using a (char⁎) to read and then write as a char. You still aren't supposed to look at that char. These rules apply to memory accessed via pointers, no matter how you cast them, and to memory accessed via union members.

Real compilers differ from that:

Every compiler I'm aware of will not enforce the rules for unions. The gcc compiler promises not to enforce the rules in this case.

Every compiler I'm aware of will let you look at any data that has been read as a char, so the memcpy trick works and you can do things like determine endianness at runtime.

It is legit to initialize a type X variable, take the address of it, cast it from (X⁎) to (Y⁎), pass it through arbitrary data structures and functions to hide the origin from the compiler, cast the (Y⁎) back to (X⁎), and then access the type X variable. If you do this, gcc may generate bad code.

Is there a way then to write compliant/non-UB/non-buggy memory allocator/GC in C/C++?
The moment you call sbrk or mmap, you're outside of standard C, so no. Treating a pointer as an integer in order to mess with the bits is also a violation.

Aside from that, the style used here is probably OK. It is hard to say what exactly would trigger the gcc bugs, but I'm pretty sure that a recent gcc would be OK for this code.

> Treating a pointer as an integer in order to mess with the bits is also a violation.

Violation of what exactly? Converting a pointer to an integer, and vice versa, is implementation defined. As long as you're not trying to write implementation-independent code, it's perfectly fine.

Sure, you can convert, but the whole point of converting is to do things that violate the C standard. In theory, a standard-conforming C implementation could have the bits of a pointer be encrypted by the CPU. There is nothing meaningful that you could do to those bits. Rounding pointers up or down for alignment is impossible in standards-conforming C code.
So UB, then? All complaint solutions I could find to this use memcpy() instead. Some use __attribute__ ((__packed__)) for structs, but that seems to have its own gotchas (https://stackoverflow.com/a/7956942), but not relevant here.
You can't use memcpy here, you have to modify the original memory to mark it as freed. You also can't allocate a new buffer to do the copy into because you are the allocator.

The C standard specifically states that a cast (T ) to and from (void ) is validly defined behavior and that you must get the original pointer back. It's also valid to go from (T ) to (U ) and back again if and only if T and U have the same alignment requirements. (void ) is required to have no alignment requirements because you can't directly de-reference it since the result would have type (void).

__attribute__((__packed__)) is a completely different topic about how the layout of the struct gets decided and what padding might be used.

What's going on here is that malloc() gives the caller a (void ) that points to one position past a (struct header_t ) and then free is casting it back from (void ) to (struct header_t ) and then going back one element in the set to get the original header with the meta-data about the allocation, this is perfectly fine because the pointer is never anything other than (void ) or (struct header_t ) as far as the language is concerned. The caller might turn the element after the (struct header_t ) into something else but the original (struct header_t *) is always the same.

"You also can't allocate a new buffer to do the copy into because you are the allocator."

You can use the stack in this case.

> The caller might turn the element after the (struct header_t ) into something else

Isn't this exactly the issue with this malloc implementation? The pointer returned that points past the header by sizeof(header) may not be aligned for subsequent types.

This is a problem inherit to all malloc() implementations, since it never gets any type information it can't ever make any adjustments about alignment of the final type that's involved. I'm not actually sure what the fully correct way to ensure that would be.
You too are losing your asterisks. Maybe a bit of Unicode can help. The only thing not stripped out by Hacker News seems to be this:

⁎ e2 81 8e

Damn markdown :). Looks like it's far too late for me to fix it now, but I'll have to keep that in mind in the future.
( * Or an asterisk surrounded by two spaces; * )
The piece of memory was previously allocated with a struct header_t at sizeof(struct header_t) bytes below the passed in block pointer.

So just recovering and using a pointer to that struct header_t from the block pointer in this way is fine.

So I suppose that header = (struct header_t*)(block - 1); would be UB then.
Haven't read the article but if block is void *, this won't compile because sizeof(void) is not defined. And they are probably trying to subtract the sizeof(struct header) hence the expression in your original comment.