| What would be an unnecessary allocation? A program generally does something with the allocated memory and at that point it becomes a necessary allocation. Specifically, it makes a lot of sense to optimize system allocators. If you can't rely on fast system allocators you are inclined to recreate your own pools and arenas that allocate from malloc() and write all the management boilerplate yourself, with your own bugs on top. This also eats up your time that you would otherwise spend on writing your application. Oh yeah, you're also likely to end up slower than any of the recent allocators published since 2000's or so. On the other hand, if the system has some modern slab-style allocator that is cache-aware and does automatic pooling of similarly sized objects, you get all that basically for free by calling malloc() and free() in a dumbfounded and "unnecessary" way, very much repeatedly. Well, the applications need to manage their memory somehow, hence the allocations and deallocations. Optimizing the system memory allocator pays off as long as you never see the allocator hogging too many cycles in your profiler. If you can get away with lots of malloc(), free(), and whatnot because of a smart allocator that ideally turns those into bumping pointers merely then, you win. Custom memory management is generally useful in some highly optimized loops where you just can't pay the cost of a random book-keeping round when calling malloc() or free(), or in cases where you can spend some to save some. Then you might want to manage your own pool so that you can guarantee there won't be operations other than O(1). Alternatively your program might benefit from a pattern where all the memory is allocated sequentially and never freed until at the very end of the operation. Processing one request or running one cycle of operation might examples. |
The thing about optimizing allocators is that you very quickly run out of ways to make them faster without a space tradeoff. Notice the first optimizations detailed in the article was to retain a 'free page' instead of returning it to the kernel: this speeds up allocation and deallocation, but increases memory usage. And if one makes it faster, why not two, or three, or fifty?
The same is true for that slab-style allocator. The basic idea is to separate memory into separate pools of different sizes, and only allocate from the pool for the given size. If that pool is full, enlarge it, even if other pools have enough space to satisfy the allocation. That's wasted memory!
So we have to balance speed against memory usage. The right balance is domain-specific: that Java app running on a monster server can afford to waste lots of memory to speed up allocations, while the allocator on my iPhone needs to be much more mindful of its space overhead.