Hacker News new | ask | show | jobs
by zokier 660 days ago
I wonder if there is something that could be done on language design level to have better "sympathy" to memory allocation, i.e. built upon having mmap/munmap as primitives instead of malloc/free; where language patterns are built around allocating pages instead of arbitrarily sized objects. Probably not practical for general high-level languages, but for e.g. embedded or high-performance stuff might make sense?
5 comments

This seems to fail to understand that we already have both levels.

Every OS will provide some mechanism to get more pages. But it turns out that managing the use of those pages requires specialized handling, depending on the use case, as well as a bunch of boilerplate. Hence, we also have malloc and its many, many cousins to allocate arbitrary size objects.

You're always welcome to use brk(2) or your OS's equivalent if you just want pages. The question is, what are you going to do with each page once you have it? That's where the next level comes in ...

In general for embedded, you don't page memory even if you're running something like embedded linux.

For high performance stuff where you need low, predictable latency, you're probably not going to want to use dynamic memory at all.

Not exactly what you're getting at, but you could maybe imagine an explicit version of malloc where allocations are destined either for thread-local only use, or shared use. Then locally freeing remote thread-local memory is an invalid operation and these kinds of assume-locality optimizations are valid on many structures. I think you can imagine a version of mmap that allows for thread-local mappings to help detect accidental misuse of local allocation.
Zig passes allocators around explicitly. There is no implicit memory allocator.

The downside is that it makes things like "print" a pain in the ass.

The upside is that you can have multiple memory allocators with hugely different characteristics (arena for per frame resources, bump allocator for network resources, etc.).

most modern memory allocators use internally mmap, this is why it most times makes sense to not use the system allocate for long running programs

Generally given that page size isn't something you know at compiler (or even install size) and it can vary between each restart and it being between anything between ~4KiB and 1GiB and most natural memory objects being much less then 4KiB but some being potentially much more then 1GiB you kind don't want to leak anything related to page sizes into your business logic if it can be helper. If you still need to most languages have memory/allocation pools you can use to get a bit more control about memory allocation/free and reuse.

Also the performance issues mentioned have not much to do with memory pages or anything like that _instead they are rooted in concurrency controls of a global resource (memory)_. I.e. thread local concurrency syncronization vs. process concurrency synchronization.

mainly instead of using a fully general purpose allocator they used an allocator whiche is still general purpose but has a design bias which improves same-thread (de)allocation perf at cost of cross thread (de)allocation perf. And they where doing a ton of cross thread (de)allocations leading to noticeable performance degradation.

The thing is even if you hypothetically only had allocations at sizes multiple of a memory page or use a ton of manual mmap you still would want to use a allocator and not always directly free freed memory back to the OS as doing so and doing a syscall on every allocation tends to lead to major performance degradation (in many use cases). So you still need concurrency controls but they come at a cost, especially for cross thread synchronization. Even just lock-free controls based on atomic have a cost over thread local controls caused often largely by cache invalidation/synchronization.