In the past I've sometimes achieved 20% or better speedups by writing custom allocators (speedup on overall performance, not just malloc performance). tcmalloc or jemalloc are great for the general case, but sometimes you know invariants about object sizes, alloc patterns and free patterns that allow much more performant allocation.
The simplest case is if you know that you will free everything at once, or nothing at all. This allows you to eliminate most bookkeeping and allows a completely lock-free architecture. But there are also more complex cases where you can still get big benefits from exploiting known invariants.
Or even the standard memory allocator provided by your system. I'm pretty sure this article was meant as a way to understand how malloc works and not as a high-performance replacement for the one you're currently using.
It's fairly common to avoid naked malloc()/free() in systems with real-time requirements. Memory pools are a great way to go if you want deterministic behavior and better reliability.
This. If you're going to malloc/free equally sized objects often but randomly, a pool allocator can be a great improvement.
With real time requirements you often care about your response time or worst case execution time. In some areas of embedded, safety critical systems you're usually prohibited from using heap at all (instead, stuff is put in global variables or on the stack - so you're only growing in one direction).
The simplest case is if you know that you will free everything at once, or nothing at all. This allows you to eliminate most bookkeeping and allows a completely lock-free architecture. But there are also more complex cases where you can still get big benefits from exploiting known invariants.