| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dalias 2233 days ago

The justifications are partly the same as what Daniel Micay has written extensively on in the rational for hardened_malloc (https://github.com/GrapheneOS/hardened_malloc) - unsynchronized per-thread state inherently sacrifices global consistency for performance and makes it impossible to detect a lot of types of memory usage errors (DF/UAF, etc) that could otherwise be caught.

However musl has the additional constraint of being compatible with small/very-low-memory environments. Lack of global consistency inherently means you will end up using memory less efficiently and requesting significantly more from the system. The new malloc about to go upstream in musl is, to my knowledge, the first/only advanced hardened allocator using slab-type design rather than traditional dlmalloc type split/merge, but also designed for extremely low overhead/waste at low to moderate usage rather than extreme performance. And in the vast majority of applications, this is perfectly reasonable. Even Firefox for example does very well with it.

With that said, new malloc is expected to be somewhat faster than old on lots of workloads (and considerably faster than old would be if we fixed the flaws in old that motivated it), but it's not a performance-oriented allocator. If you really want/need that you should probably link jemalloc or similar (and accept all the tradeoffs that come with that). In Rust programs without "unsafe", it may make sense to do that by default.

2 comments

scott_s 2233 days ago

Thanks for the clear explanation. Looking at the source code, it looks similar to modern allocators, just without the per-thread heaps. (I think all modern allocators use size-class slab allocators for small objects.) Curiously, I don't think the academic community has much literature on hardened allocators. It's been a while since I've worked in the area, but I wasn't aware of any other than DieHard from 2006 [1]. I did some searched on the ACM Digital Library (I love that it's all free right now so I can easily provide links in forums), and the only other thing I could find was FreeGuard from 2017 [2]. Maybe the issue there is that academics who design memory allocators tend to be on the systems side of CS, and such people tend to use raw performance as a part of the evaluation. Better security for a new thing does not show up in a graph. (Even that FreeGuard paper from 2017 claims security with better performance.)

In the non-academic world, I found the one we're discussing, but also Scudo (https://llvm.org/docs/ScudoHardenedAllocator.html). And that's it. If I still worked in the area, I would try to go after scalable hardened allocators. I wonder if there's still some clever stuff we haven't thought of there.

[1] https://github.com/emeryberger/DieHard, https://dl.acm.org/doi/abs/10.1145/1133981.1134000

[2] https://github.com/UTSASRG/FreeGuard, https://dl.acm.org/doi/abs/10.1145/3133956.3133957

link

fluffything 2233 days ago

> However musl has the additional constraint of being compatible with small/very-low-memory environments.

How many threads do these have ?

If they only have one thread, they'll use 72x less memory than if they would have 72 threads.

The thing is that if you are using 72 threads you probably would like your application to be 72x faster than if you are using only one. So synchronizing all allocations and killing scalability doesn't solve these users problems.

Most allocators, including jemalloc, tcmalloc, mimalloc, etc. have a "hardened" mode, that people can opt into if they want.

If I'm using Rust like the user in the blog post, double frees are caught at compile-time, so I'd rather not pay for them at run-time.

link

jessermeyer 2233 days ago

Not less than two weeks ago the DragonFly kernel allocator made related improvements for very high core CPUs.

https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/018...

link