| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by JonChesterfield 554 days ago

It's mostly terminology and conventions. On the standard system setup, the linux kernel running in a special processor mode does these things. Linux userspace asks the kernel to do stuff using syscall and memory which both kernel and userspace can access. E.g. the io_uring register followed by writing packets into the memory.

What the GPU has is read/write access to memory that the CPU can also access. And network peripherals etc. You can do things like alternately compare-and-swap on the same page from x64 threads and amdgpu kernels and it works, possibly not quickly on some systems. That's also all that the x64 CPU threads have though, modulo the magic syscall instruction to ask the kernel to do stuff.

People sometimes get quite cross at my claim that the GPU can do fprintf. Cos actually all it can do is write numbers into shared memory or raise interrupts such that the effect of fprintf is observed. But that's also all the userspace x64 threads do, and this is all libc anyway, so I don't see what people are so cross about. You're writing C, you call `fprintf(stderr, "Got to L42\n");` or whatever, and you see the message on the console.

If fprintf compiles into a load of varargs mangling with a fwrite underneath, and the varargs stuff runs on the GPU silicon and the fwrite goes through a staging buffer before some kernel thread deals with it, that seems fine.

I'm pretty sure you could write to an nvme drive directly from the gpu, no talking to the host kernel at all, at which point you've arguably implemented (part of?) a driver for it. You can definitely write to network cards from them, without using any of this machinery.

1 comments

saagarjha 553 days ago

We don't actually allow a GPU to directly fprintf, because GPU can't syscall. Only userspace can do that. You can have userspace keep polling and then do it on behalf of the GPU, but that's not the GPU doing it.

link

adrian_b 553 days ago

The GPU could do the equivalent of fprintf, if the concerned peripherals used only memory-mapped I/O an the IOMMU would be configured to allow the GPU to access directly those peripherals, without any involvement from the OS kernel that runs on the CPU.

This is the same as on the CPU, where the kernel can allow a user process to access directly a peripheral, without using system calls, by mapping that peripheral in the memory space of the user process.

In both cases the peripheral must be assigned exclusively to the GPU or the user process. What is lost by not using system calls is the ability to share the peripheral between multiple processes, but the performance for the exclusive user of the peripheral can be considerably increased. Of course, the complexity of the user process or GPU code is also increased, because it must include the equivalent of the kernel device driver for that peripheral.

link

jhuber6 553 days ago

At some point I was looking into using io_uring for something like this. The uring interface just works off of `mmap()` memory, which can be registered with the GPU's MMU. There's a submission polling setting, which means that the GPU can simply write to the pointer and the kernel will eventually pick up the write syscall associated with it. That would allow you to use `snprintf` locally into a buffer and then block on its completion. The issue is that the kernel thread goes to sleep after some time, so you'd still need a syscall from the GPU to wake it up. AMD GPUs actually support software level interrupts which could be routed to a syscall, but I didn't venture too deep down that rabbit hole.

link

fulafel 553 days ago

File I/O would be a can of worms. But "i want to use fprintf" specifically: Stdio files don't need to be backed by Unix FD's. See eg https://www.gnu.org/software/libc/manual/html_node/Other-Kin... an eg fmemopen().

link