|
|
|
|
|
by s_kanev
2239 days ago
|
|
This was the case a few years back when the fastest pools were implemented with recursive data structures (e.g. linked lists for the freelists in gperftools). In the new tcmalloc (and, I think, hoard?) the fastest pools are essentially slabs with bump allocation, so the fastest (and by far, the most common) calls are a grand total of 15 or so instructions, without many cache misses (size class lookups tend to stay in the cache). Call overhead can be a substantial chunk of that. |
|
But that’s not the whole story. Malloc perf is about what happens on free and what happens when the program accesses that memory that malloc gave it.
When you factor that all in, it doesn’t matter how many instructions the malloc has. It matters whether those instructions form a bad dependency chain, if they miss cache, whether the memory we return is in the “best” place, and how much work happens in free (using the same metrics - dependency chain length and misses, not total number of instructions or whether there’s a call).