Hacker News new | ask | show | jobs
by astrange 1571 days ago
You can't implement `malloc()` in C because the pointers it returns are defined to be pointers to different objects, and if you can see the implementation of `malloc` then you know it's returning pointers to the same object (the page it gets from mmap/sbrk). There's no functionality in C to actually do what it says except for the function itself.
3 comments

My understanding is that this is covered by the "effective type" language in both C99 and C11[1].

malloc itself is specified to return a void pointer, but the effective type established during assignment circumvents what would otherwise be UB via aliasing[2].

Edit: Specifically, the reasoning is:

1. Allocated objects have no declared type;

2. 6.5.6: Objects with no declared type have the type of their accessing lvalue

3. 6.5.7: Access of a stored value is valid for lvalue expressions that are type compatible with the effective type.

So there's no UB here. `malloc` itself doesn't need to know about the effective type produced by the lvalue, and C (at least C99 onwards) respects that type for strict aliasing purposes.

[1]: https://port70.net/%7Ensz/c/c11/n1570.html#6.5p6

[2]: https://port70.net/%7Ensz/c/c11/n1570.html#6.5p7

The issue in C99 is:

> The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.

If the implementation of malloc is visible, then it can be inlined etc., and then you have to deal with the effects of this obviously not being true.

It would have been better, if the writer had not used the word “object”.
Here's an outline of a toy version:

    #include <stdlib.h> /* for size_t */
    static char data_storage[10000000];
    static char *nextp = data_storage;
    
    void *malloc(size_t size)
    {
        void *p = nextp;
        nextp += size;  
        return p;
    }
    void free(void *p){}
This does not "yield a pointer to an object disjoint from any other object". So it only works if the implementation is not visible to callers and they can pretend it's compliant.
Huh? My version never returns a pointer to space that was already allocated because it never deallocates anything. So the objects are ALWAYS disjoint from every other object. Or are you concerned that they are aliased with the static 'data_storage' char array?

More context around your quote:

> The lifetime of an allocated object extends from the allocation until the deallocation. Each such allocation shall yield a pointer to an object disjoint from any other object.

https://port70.net/%7Ensz/c/c11/n1570.html#7.22.3

In context, I take that to mean that all allocated objects must be disjoint from one another throughout their lifetime. It wouldn't make sense otherwise.

> So the objects are ALWAYS disjoint from every other object. Or are you concerned that they are aliased with the static 'data_storage' char array?

Yes, plus if you call it twice it returns two pointers to the same object, because there's no way in C to create another "object". Of course, as long as the callers don't know this it's fine, but it would be a problem if eg you compile your custom malloc with ASAN/Valgrind and don't tell it that it's a malloc.

I think C++ partially addresses this with "placement new" but not sure how far.

> if you call it twice it returns two pointers to the same object

How are you defining an object? This draft version of the C11 spec defines 'object' as a "region of data storage in the execution environment, the contents of which can represent values". https://port70.net/%7Ensz/c/c11/n1570.html#3.15

The drafting is not the best. The key point is that objects have bounds (even though it doesn't explicitly say it), and pointers point to a base object, and it's UB if those pointers go outside the bounds of that object.

malloc is defined to return a pointer to a new "base object". But your code doesn't do that; you can see by reading it that it returns a pointer to `data_storage`. That means the UB conditions for using that pointer don't match the spec.

You could say it's supposed to magically work if the function is named `malloc`, but my understanding of all C implementations is they don't do that.

You never could implement malloc() in C anyway, because the language lacks the ability to manipulate the MMU.
C runs on platforms such as microcontrollers that don't have MMU's. Malloc is available on those platforms.
The point isn't that malloc is available, rather that it is impossible to write in ISO C, other than mapping a global memory segment.
It's strict aliasing that's the issue, not the mapping of memory to a address space. You run into the same issues with a global segment.
Both are a problem, because they can only be implemented outside ISO C semantics.

That is why I tend to always write ISO C instead of C.

One's not a blocker to a malloc implementation since you could otherwise just cut up a static char buffer. Coordinating with the OS is a nice to have. Strict aliasing is hard blocker. Hence why that's the issue under discussion.

And I'm not sure I understand why you're using the inability to describe certain actions as the reason why you use ISO C.