Hacker News new | ask | show | jobs
by clairebyte 529 days ago
>The act of writing a value of a different type tells the compiler that the lifetime of the previous object has ended.

afaik only memcpy has that magic property, so I think parent is almost correct.

  void *p = malloc(n);
  *(int *)p = 42; // ok, *p is now an int.
  //*(float *)p = 3.14f; // I think this is not allowed, p points to an int object, regular stores do not change effective type
  float x = 3.14f;
  memcpy(p, &x, sizeof(float)); // but this is fine, *p now has effective type float
So in the new, pool_new:

  pool->chunk_arr[i].next = &pool->chunk_arr[i + 1];
This sets the effect type of the chunk block to 'Chunk'

Later in pool_alloc:

  Chunk* result    = pool->free_chunk;
  ...
  return result;
result has effective type 'Chunk'

In user code:

  int *x = pool_alloc();
  *x = 42; // aliasing violation, *x has effective type 'Chunk' but tried to access it as an int*
User code would need to look like this:

  int *x = pool_alloc();
  memcpy(x, &(int){0}, sizeof(int)); // establish new effective type as 'int'
  // now we can do
  *x = 42;**
4 comments

And this is why type based alias analysis (TBAA) is insane and why projects like linux complies with fno-strict-aliasing.

C should issue a defect report and get rid of that nonsense from the standard.

C doesn't have "alias analysis" in the standard. It has an (informally specified) memory model which has "memory objects" which have a single type, which means treating them as a different type is undefined behavior.

This enables security analysis like valgrind/ASan and secure hardware like MTE/CHERI so it's very important and you can't get rid of it.

However, it's not possible to implement malloc() in C because malloc() is defined as returning new "memory objects" and there is no C operation which creates "memory objects" except malloc() itself. So it only works as long as you can't see into the implementation, or if the compiler gives you special forgiveness somehow.

C++ has such an operation called "placement new", so you want something like that.

You can definitely implement malloc in C. It does nothing special in its most basic form but cough up void pointers into its own arena.

It gets complicated when you have virtual memory and an OS involved but even then you can override the system malloc with a simple implementation that allocates from a large static array.

No, returning parts of an array does not implement malloc as described in the standard. That's not a new memory object, it's a part of an existing one.
The standard is written to accommodate obsolete tagged memory architectures that require special support. They aren't relevant today and data pointers are fungible regardless of where they originate.
> data pointers are fungible regardless of where they originate.

This was never true because of something called provenance: https://www.ralfj.de/blog/2020/12/14/provenance.html. Though it usually doesn't matter and I think it annoys anyone who finds out about it.

But in practice it's not always true on Apple A12 or later because they support PAC (so pointers of different type to the same address can be not equal bit-wise) and is even less true on very latest Android because it supports the really big gun MTE. And MTE is great; you don't want to miss out on it. No explainer here because there's no Wikipedia article for it(!).

Also becomes not true on any system if you use -fbounds-safety or some of the sanitizers.

Morello is.
There are other issues besides changing the memory type. For instance, C has those rules about out of bounds pointers being undefined, but you can't implement that - if you return part of the pool and someone calculates an out of bounds address they're getting a valid address to the rest of the pool. That's why you can't implement malloc() in C.

(The difference here is that system malloc() works with valgrind, -fbounds-safety, theoretical secure hardware with bounds checking etc., and this one doesn't.)

Undefined behavior is behavior you can't avoid implementing, because no matter what your compiler and runtime do, it complies with the spec. In particular getting valid addresses to other objects from out-of-bounds address arithmetic is not just conformant with the C standard but by far the most common conforming behavior.
Meant to say you can't implement it as an invalid/trap state. This is possible in some implementations but they have to cooperate with you to do it.

> In particular getting valid addresses to other objects from out-of-bounds address arithmetic is not just conformant with the C standard but by far the most common conforming behavior.

One reason calculating out of bounds addresses might not work out is the calculation might cause the pointer to overflow, and then surprising things might happen like comparisons failing or tag bits in the high bytes getting corrupted.

Oh, then I agree. My apologies for interpreting you as saying something so obviously incorrect. Yes, in particular CHERI has a mechanism to shrink the bounds of a pointer, but just returning a pointer into an array won't do it.
> aliasing violation, *x has effective type 'Chunk'

This doesn't make any sense. How do you know its effective type if you don't have access to the definition of `pool_alloc()'?

If you can guarantee its always compiled in a separate TU and never inlined, sure, might be a practical way so 'solve' this issue, but if you then do some LTO (or do a unity build or something) the compiler might suddenly break your code. Another way is to add an inline asm block with "memory" clobber to escape the pointer so the optimizer can't destroy your code.

It's really quite ridiculous that compiler implementers have managed to overzealously nitpick the C standard so that you can't implement a memory allocator in C.

> It's really quite ridiculous that compiler implementers have managed to overzealously nitpick the C standard so that you can't implement a memory allocator in C.

This is good because it's also what gives you valgrind and CHERI. Take one away and you can't have the other.

(Because if you define all undefined behavior, then programs will rely on it and you can't assume it's an error anymore.)

Any type-changing store to allocated storage has this property in C.