|
|
|
|
|
by pizlonator
2239 days ago
|
|
First of all, let’s be clear: for a lot of algorithms adding call overhead around them is free because the cpu predicts the call, predicts the return, and uses register renaming to handle arguments and return. But that’s not the whole story. Malloc perf is about what happens on free and what happens when the program accesses that memory that malloc gave it. When you factor that all in, it doesn’t matter how many instructions the malloc has. It matters whether those instructions form a bad dependency chain, if they miss cache, whether the memory we return is in the “best” place, and how much work happens in free (using the same metrics - dependency chain length and misses, not total number of instructions or whether there’s a call). |
|
This is only thinking at the hardware layer. There are also effects at the software layer.
Function calls not known to LTO prevent all sorts of compiler optimizations as well. In addition, as Jeff says on this thread, not being able to inline free means you get no benefit from sized-delete, which is a substantial improvement on some workloads. (I'd cite the exact number, but I'm not sure Google ever released it.)
Source: I literally worked on this.