I had a system that was sped up by 30%+ on Windows by switching from HeapAlloc to jemalloc. Profiling showed that HeapAlloc was largly stuck in a single giant lock. (This was on Windows Server 2016, IIRC.) And that wasn't even that allocation-heavy in the large scale of it; most of memory was done through arena allocations, but a few larger buffers were not.
So much so that beating it with a custom allocator is a real challenge.