|
|
|
|
|
by leiroigh
459 days ago
|
|
The main problem with that is that it doesn't play nice with most languages. Consider int foo(int* ptr) {
int x = ptr[1<<16];
*ptr += 1;
return x + ptr[1<<16];
}
Compilers/languages/specs tend to decide that `ptr` and `ptr + (1<<16)` cannot alias, and this can be compiled into e.g. foo(int*):
mov eax, dword ptr [rdi + 262144]
inc dword ptr [rdi]
add eax, eax
ret
which gives undesired results if `ptr` and `ptr + (1<<16)` happen to be mapped to the same physical address. This is also pretty shit to debug/test -- some day, somebody will enable LTO for an easy performance win on release builds, and bad code with a security vuln gets shipped. |
|
I do think though there are some downsides to this approach that may or may not be deal-breakers:
* Platform dependence. Each of the crates I mention has a fair bit of platform-specific `unsafe` code that only supports userspace on a few fixed OSs. They fundamentally can't work on microcontrollers with no MMU; I don't think WASM has this kind of flexibility either.
* Either setting up each buffer is a bit expensive (several system calls + faulting each page) or you have to do some free-listing on your own to mitigate. You can't just rely on the standard memory allocator to do it for you. Coincidentally just like last week I was saying freelisting is super easy for video frames where you have a nice bound on number of things in the list and a fixed size, but if you're freelisting these at the library level or something you might need to be more general.
* Buffer size constraints. Needs to be a multiple of the page size; some applications might want smaller buffers.
* Relatedly, extra TLB pressure, which is significant in many applications' performance. Not just because you have the same region mapped twice. Also that the buffer size constraints mentioned above make it likely you won't use huge pages, so on e.g. x86-64 you might use 4 KiB pages rather than 2 MiB (additional factor of 512x) or 1 GiB (additional factor of 262144x) as the memory allocator would help you do if they could be stuffed into the same huge page as other allocations.