Hacker News new | ask | show | jobs
by Rolcol 2169 days ago
What size is a memory page on other ARM CPUs? I think Apple's processors use 16KiB pages. Doesn't x86 software assume a 4KiB page size, unless it deals with huge pages?
4 comments

According to [1] at 14:00, the final hardware will support 4kb pages. But the DTK only supports 16kb pages.

[1] https://developer.apple.com/videos/play/wwdc2020/10686/

They all support 4KiB, 16KiB, and 1MiB. It's required by the ARM spec, obviously with the exception of CPUs that don't have an MMU. Support for 16MiB pages is optional.
And for 64-bit ARM, the base page sizes are 4KiB, 16KiB, and 64KiB, with IIRC 16KiB being optional. If you want 52-bit physical addresses, you need to use 64KiB base page size, otherwise the maximum is 48-bit physical addresses; this is probably why RHEL uses 64KiB page size on 64-bit ARM.
64kb is also a better match for today's working sets to avoid TLB pressure. I imagine x86 would switch to it or something close if it weren't such a schlep.
Another advantage of a 64KiB page size is that it allows for a bigger L1. The L1 is usually VIPT (for good reasons), and to prevent confusing issues with aliases, it means that its maximum size is a single page per cache way. For an 8-way cache, that means the L1 can be at most 32KiB with a 4KiB page size; with a 64KiB page size, even a 2-way L1 cache could have up to 128KiB.
openSUSE started with 64KiB as well, but that was found to have massive memory overhead with smaller files, so it was switched back to 4KiB.
Yeah, see e.g. the Linus Torvalds rants about the "optimal" base page size.

You can probably make a good case for the optimal base page size being larger than 4kB today, but probably not by very much. Maybe 16 kB or so. But then it's not a huge advantage over 4 kB which has the benefit of compatibility, so, meh..

You're conflating two separate things. There's hardware-level page size, which is the logical unit of the mmu. On x86, this is always 4k. The kernel can map mmu-level pages into the address space of running processes. As an optimization, it might always map these mmu-level pages in batches. The batch size comprises the virtual page size; for instance, it could always map in batches of 4, for a 16k virtual page. But the CPU-level page size is completely fixed.
Page size is virtualized when running using Rosetta 2
Partially, Electron already has an issue open for the DTK because Chromium doesn’t like it: https://github.com/electron/electron/issues/24319