Hacker News new | ask | show | jobs
by woodruffw 1571 days ago
> I mean you can't even write an allocator in standard C.

I might be missing something, but I don't believe this is true? The type aliasing rules have an explicit carve-out for `char *` aliases of other types, which is intended to solve this exact issue.

Similarly, C99 and C11 both allow aliasing through a union of pointers without violating the strict aliasing rule.

1 comments

You can't implement `malloc()` in C because the pointers it returns are defined to be pointers to different objects, and if you can see the implementation of `malloc` then you know it's returning pointers to the same object (the page it gets from mmap/sbrk). There's no functionality in C to actually do what it says except for the function itself.
My understanding is that this is covered by the "effective type" language in both C99 and C11[1].

malloc itself is specified to return a void pointer, but the effective type established during assignment circumvents what would otherwise be UB via aliasing[2].

Edit: Specifically, the reasoning is:

1. Allocated objects have no declared type;

2. 6.5.6: Objects with no declared type have the type of their accessing lvalue

3. 6.5.7: Access of a stored value is valid for lvalue expressions that are type compatible with the effective type.

So there's no UB here. `malloc` itself doesn't need to know about the effective type produced by the lvalue, and C (at least C99 onwards) respects that type for strict aliasing purposes.

[1]: https://port70.net/%7Ensz/c/c11/n1570.html#6.5p6

[2]: https://port70.net/%7Ensz/c/c11/n1570.html#6.5p7

The issue in C99 is:

> The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.

If the implementation of malloc is visible, then it can be inlined etc., and then you have to deal with the effects of this obviously not being true.

It would have been better, if the writer had not used the word “object”.
Here's an outline of a toy version:

    #include <stdlib.h> /* for size_t */
    static char data_storage[10000000];
    static char *nextp = data_storage;
    
    void *malloc(size_t size)
    {
        void *p = nextp;
        nextp += size;  
        return p;
    }
    void free(void *p){}
This does not "yield a pointer to an object disjoint from any other object". So it only works if the implementation is not visible to callers and they can pretend it's compliant.
Huh? My version never returns a pointer to space that was already allocated because it never deallocates anything. So the objects are ALWAYS disjoint from every other object. Or are you concerned that they are aliased with the static 'data_storage' char array?

More context around your quote:

> The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.

https://port70.net/%7Ensz/c/c11/n1570.html#7.22.3

In context, I take that to mean that all allocated objects must be disjoint from one another throughout their lifetime. It wouldn't make sense otherwise.

> So the objects are ALWAYS disjoint from every other object. Or are you concerned that they are aliased with the static 'data_storage' char array?

Yes, plus if you call it twice it returns two pointers to the same object, because there's no way in C to create another "object". Of course, as long as the callers don't know this it's fine, but it would be a problem if eg you compile your custom malloc with ASAN/Valgrind and don't tell it that it's a malloc.

I think C++ partially addresses this with "placement new" but not sure how far.

> if you call it twice it returns two pointers to the same object

How are you defining an object? This draft version of the C11 spec defines 'object' as a "region of data storage in the execution environment, the contents of which can represent values". https://port70.net/%7Ensz/c/c11/n1570.html#3.15

You never could implement malloc() in C anyway, because the language lacks the ability to manipulate the MMU.
C runs on platforms such as microcontrollers that don't have MMU's. Malloc is available on those platforms.
The point isn't that malloc is available, rather that it is impossible to write in ISO C, other than mapping a global memory segment.
It's strict aliasing that's the issue, not the mapping of memory to a address space. You run into the same issues with a global segment.
Both are a problem, because they can only be implemented outside ISO C semantics.

That is why I tend to always write ISO C instead of C.