Hacker News new | ask | show | jobs
by pzmarzly 660 days ago
> However, since it may be used elsewhere, a better solution is to replace the default allocator with one that uses malloc and free instead of new and delete.

C++ noob here, but is libc++'s default allocator (I mean, the default implementation of new and delete) actually doing something different than calling libc's malloc and free under the hood? If so, why?

4 comments

Not the strongest on C++ myself, but the new[] will attempt to run constructors on each element after calling the new operator to get the RAM. The delete[] will attempt to run destructors for each element before calling operator delete[] to free the RAM.

In order for delete[] to work, C++ must track the allocation size somewhere. This could be co-located with the allocation (at ptr - sizeof(size_t) for example), or it could be in some other structure. Using another structure lowers the odds of it getting trampled if/when something writes to memory beyond an object, but comes with a lookup cost, and code to handle this new structure.

I'm sure proper C++ libraries are doing even more, but you already get the idea, new and delete are not the same as malloc and free.

> In order for delete[] to work, C++ must track the allocation size somewhere.

That is super-interesting, I had never considered this, but you're absolutely right. I am now incredibly curious how the standard library implementations do this. I've heard normal malloc() sometimes colocates data in similar ways, I wonder if C++ then "doubles up" on that metadata. Or maybe the standard library has it's own entirely custom allocator that doesn't use malloc() at all? I can't imagine that's true, because you'd want to be able to swap system allocators with e.g. LD_PRELOAD (especially for Valgrind and stuff). They could also just be tracking it "to the side" in some hash table or something, but that seems bad for performance.

new[] and delete[] both know the type of the object. Therefore both know whether a destructor needs to be called.

When a destructor doesn't - e.g., new int[] - operator new[] is called upon to allocate N*sizeof(T) bytes. The code stores off no metadata. The result of operator new[] is the array address.

When a destructor does - e.g., new std::string[] - operator new[] is called upon to allocate sizeof(size_t)+N*sizeof(T) bytes. The code stores off the item count in the size_t, adds sizeof(size_t) to the value returned by operator new[], uses that as the address for the array, and calls T() on each item. And delete[] performs the opposite: fishes out the size_t, calls ~T() on each item, subtracts sizeof(size_t) from the array address, and passes that to operator delete[] to free the buffer.

(There are also some additional things to cater for: null checks, alignment, and so on. Just details.)

Note that operator new[] is not given any information about whether a destructor needs to run, or whether there is any metadata being stored off. It just gets called with a byte count. Exercise caution when using placement operator new[], because a preallocated buffer of N*sizeof(T) may not be large enough.

jemalloc and tcmalloc use size classes, so if you allocate 23 bytes the allocator reserves 32 bytes of space on your behalf. Both of them can find the size class of a pointer with simple manipulation of the pointer itself, not with some global hash table. E.g. in tcmalloc the pointer belongs to a "page" and every pointer on that page has the same size.
That doesn’t help for C++ if you allocated an array of objects with destructors. It has to know that you allocated 23 objects, so that it can call 23 destructors, not 32 ones, 9 of which on uninitialized memory.
I believe the question was more around how the program knows how much memory to deallocate. The compiler generates the destructor calls the same way the compiler generates everything else in the program.
Isn't it also possible for other logic to run in a destructor, such as freeing pointers to external resources? Doesn't this cause (at the very least) the possibility for more advanced logic to be run beyond freeing the object's own memory?
Yes, it usually is. See, e.g., smart pointers.
realloc is the same, as the old memory needs to be copied to the new memory.
ISO C++ doesn't require new and delete default implementations to call down into malloc()/free().

Many implementations do it, only because it is already there and thus it is easy just to reach for them.

No, modulo the aligned allocation overloads, but applications are allowed to override the default standard library operator new with their own, even on platforms that don't have an equivalent to ELF symbol interposition.
That doesn't really explain where the dependency on the C++ runtime come from tho, as far as I know the dependency chain is std::allocator -> operator new -> malloc, but from the post the replacement only strips out the `operator new`.

Notably I thought the issue would be the throwing of `std::bad_alloc`, but the new version still implements std::allocator, and throws bad_alloc.

And so I assume the issue is that the global `operator new` is concrete (it just takes the size of the allocation), thus you need to link to the C++ runtime just to get that function? In which case you might be able to get the same gains by redefining the global `operator new` and `operator delete`, without touching the allocator.

Alternatively, you might be able to statically link the C++ runtime and have DCE take care of the rest.

> Notably I thought the issue would be the throwing of `std::bad_alloc`, but the new version still implements std::allocator, and throws bad_alloc.

The new version uses `FMT_THROW` macro instead of a bare throw. The article says "One obvious problem is exceptions and those can be disabled via FMT_THROW, e.g. by defining it to abort". If you check the `g++` invocation, that's exactly what the author does.

The author also compiles with `-fno-exceptions` which should already have the same behaviour.
Yes they could have just defined their own global operator new/delete to have a micro-runtime. Same as you'd do if you were doing a kernel in C++. Super easy, barely an inconvenience
Changing global new/delete is a non-starter in a reusable library. Allocator is a much more localized change and roughly the same amount of work.
The main point of replacing it with malloc is that new will throw std::bal_alloc so using it requires linking against the c++ runtime.
Only if not using nothrow placement new syntax.