Hacker News new | ask | show | jobs
by Veserv 356 days ago
MMU hardware has a hardware-defined minimum mapping granularity (i.e. page size). On x86-64 this is 4KB. On ARMv8/v9 the specification allows runtime configuration for any of 4KB, 16KB, or 64KB as the minimum mapping granularity on a per-process basis, though the actual chip implementation is free to only support a subset of such configurations and still be conformant.

Implementations may also define certain sizes that are more efficient. On modern x86-64, large pages may allow 2MB, 1GB, 512 GB, etc. (multiples of 512) to be more efficient. On modern ARMv8/v9 there is similar large page support (the exact details depend on the minimum mapping granularity) and they may also support aligned, contiguous mappings which making other sizes efficient (the exact details are even more complicated than simple large pages and highly optional).

As such, if you want to get just "1 byte", then you create a userspace allocator that asks for memory that conforms to the limits of the MMU and then it is the job of the userspace allocator to abstract that away for you by chopping it up under its own auspices.

1 comments

I suppose having some sort of minimum granularity for the MMU makes a lot of sense, but above that point, why limit the "steps" or the maximum? why not 4KB+2 for example (4098)? For the sake or discussion, let's say I have 128GB ram and 50TB page file, can that work? Why is a fixed-size page more efficient, is it because of offset calculations being simpler arithmetic operations? I figured the TLB is (or can be) a simple (large) table implemented in hardware, the simplicity translating into O(N) complexity. I am suspecting it might have to do with hardware cost?