| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by paavoova 2850 days ago

  header = (struct header_t*)block - 1;

Isn't this UB?

3 comments

garethrees 2850 days ago

The cast is defined by ISO/IEC 9899:1990 §6.3.2.3.1, since block is a pointer to void, and struct header_t is an object type:

"A pointer to void may be converted to or from a pointer to any incomplete or object type."

The subtraction is defined by §6.5.6.8, provided that block points to an element of a large enough array object:

"When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression."

(There's similar text in other versions of the C standard.)

burfog 2850 days ago

That part of the standard only covers the cast. It means that you won't mangle a pointer if you cast it to a void pointer and then back to the original pointer type.

Accessing the data that is being pointed at is another matter entirely. You must satisfy alignment constraints. You also must not read any memory as a type other than what it was written as, aside from a very limited exception for type char.

garethrees 2850 days ago

There doesn't seem to be any pointer dereference on the line that paavoova quoted, so I don't see how your comment applies.

burfog 2850 days ago

The trouble is that it is very easy to interpret things the wrong way. You showed that the casting is OK. People will tend to wrongly assume that they are home free at that point, and everything will be standards-compliant. Most people don't realize that the dereference itself can be a problem. Casting is very frequently followed by non-compliant dereferences. The gcc warnings about strict aliasing do not catch all the problems. Adding more casts, including one to a void pointer, is a common way to make warnings go away without actually stopping the compiler from breaking non-compliant code.

Looking at the full code on the web site, I think it is compliant but dangerous. It is decently likely to trigger gcc bugs.

simcop2387 2850 days ago

Not quite, (void *) is special in this regard IIRC, specifically for cases like this. It's a valid pointer but you're required to ensure that any platform requirements like alignment and size will match the requirements of what you're doing.

burfog 2850 days ago

No, (void⁎) is not special in this way. It just makes things look nicer.

The code also isn't undefined behavior... but you are really asking to hit compiler bugs! This is an easy way to confuse gcc into wrongly determining that the code has undefined behavior, and if gcc gets confused then it may determine that a code path can't be taken. Code paths that can't be taken may be deleted.

The main rule here is that memory has a type which is determined by what was last written into it, and you may only read or examine the memory using that type. (for the type, we ignore attributes like the distinction between signed and unsigned) There is a minor exception that is just enough to implement something like memcpy by using a (char⁎) to read and then write as a char. You still aren't supposed to look at that char. These rules apply to memory accessed via pointers, no matter how you cast them, and to memory accessed via union members.

Real compilers differ from that:

Every compiler I'm aware of will not enforce the rules for unions. The gcc compiler promises not to enforce the rules in this case.

Every compiler I'm aware of will let you look at any data that has been read as a char, so the memcpy trick works and you can do things like determine endianness at runtime.

It is legit to initialize a type X variable, take the address of it, cast it from (X⁎) to (Y⁎), pass it through arbitrary data structures and functions to hide the origin from the compiler, cast the (Y⁎) back to (X⁎), and then access the type X variable. If you do this, gcc may generate bad code.

tomp 2850 days ago

Is there a way then to write compliant/non-UB/non-buggy memory allocator/GC in C/C++?

burfog 2850 days ago

The moment you call sbrk or mmap, you're outside of standard C, so no. Treating a pointer as an integer in order to mess with the bits is also a violation.

Aside from that, the style used here is probably OK. It is hard to say what exactly would trigger the gcc bugs, but I'm pretty sure that a recent gcc would be OK for this code.

nemetroid 2850 days ago

> Treating a pointer as an integer in order to mess with the bits is also a violation.

Violation of what exactly? Converting a pointer to an integer, and vice versa, is implementation defined. As long as you're not trying to write implementation-independent code, it's perfectly fine.

burfog 2850 days ago

Sure, you can convert, but the whole point of converting is to do things that violate the C standard. In theory, a standard-conforming C implementation could have the bits of a pointer be encrypted by the CPU. There is nothing meaningful that you could do to those bits. Rounding pointers up or down for alignment is impossible in standards-conforming C code.

paavoova 2850 days ago

So UB, then? All complaint solutions I could find to this use memcpy() instead. Some use __attribute__ ((__packed__)) for structs, but that seems to have its own gotchas (https://stackoverflow.com/a/7956942), but not relevant here.

simcop2387 2850 days ago

You can't use memcpy here, you have to modify the original memory to mark it as freed. You also can't allocate a new buffer to do the copy into because you are the allocator.

The C standard specifically states that a cast (T ) to and from (void ) is validly defined behavior and that you must get the original pointer back. It's also valid to go from (T ) to (U ) and back again if and only if T and U have the same alignment requirements. (void ) is required to have no alignment requirements because you can't directly de-reference it since the result would have type (void).

__attribute__((__packed__)) is a completely different topic about how the layout of the struct gets decided and what padding might be used.

What's going on here is that malloc() gives the caller a (void ) that points to one position past a (struct header_t ) and then free is casting it back from (void ) to (struct header_t ) and then going back one element in the set to get the original header with the meta-data about the allocation, this is perfectly fine because the pointer is never anything other than (void ) or (struct header_t ) as far as the language is concerned. The caller might turn the element after the (struct header_t ) into something else but the original (struct header_t *) is always the same.

ahoka 2850 days ago

"You also can't allocate a new buffer to do the copy into because you are the allocator."

You can use the stack in this case.

paavoova 2850 days ago

> The caller might turn the element after the (struct header_t ) into something else

Isn't this exactly the issue with this malloc implementation? The pointer returned that points past the header by sizeof(header) may not be aligned for subsequent types.

simcop2387 2850 days ago

This is a problem inherit to all malloc() implementations, since it never gets any type information it can't ever make any adjustments about alignment of the final type that's involved. I'm not actually sure what the fully correct way to ensure that would be.

burfog 2850 days ago

You too are losing your asterisks. Maybe a bit of Unicode can help. The only thing not stripped out by Hacker News seems to be this:

⁎ e2 81 8e

simcop2387 2850 days ago

Damn markdown :). Looks like it's far too late for me to fix it now, but I'll have to keep that in mind in the future.

nothrabannosir 2850 days ago

( * Or an asterisk surrounded by two spaces; * )

noselasd 2850 days ago

The piece of memory was previously allocated with a struct header_t at sizeof(struct header_t) bytes below the passed in block pointer.

So just recovering and using a pointer to that struct header_t from the block pointer in this way is fine.

dsamarin 2850 days ago

So I suppose that header = (struct header_t*)(block - 1); would be UB then.

mav3rick 2850 days ago

Haven't read the article but if block is void *, this won't compile because sizeof(void) is not defined. And they are probably trying to subtract the sizeof(struct header) hence the expression in your original comment.

tom_ 2850 days ago

gcc actually lets you do this: https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Pointer-Arith.h...

ahoka 2850 days ago

I think it would be better to go with char* instead of void*. A much better solution is to use a hashtable for your blocks, but I understand this is just a toy example.

I think he also forgot to implement memalign(3) and posix_memalign(3).