|
|
|
|
|
by almostgotcaught
556 days ago
|
|
> Well, I called it syscall because it's a void function of 8 u64 arguments which your code stumbles into, gets suspended, then restored with new values for those integers I'm put it really simply: is there a difference (in perf, semantics, whatever) between using this "syscalls" to implement fopen on GPU and using a syscall to implement fopen on CPU? Note that's a rhetorical question because we both already know that the answer is yes. So again you're just playing slight of hand in calling them syscalls and I'll emphasize: this is a slight of hand that the dev himself doesn't play (so why would I take your word over his). |
|
If semantics are different, that's a bug/todo. It'll have worse latency than a CPU thread making the same kernel request. Throughput shouldn't be way off. The GPU writes some integers to memory that the CPU will need to read, and then write other integers, and then load those again. Plus whatever the x64 syscall itself does. That's a bunch of cache line invalidation and reads. It's not as fast as if the hardware guys were on board with the strategy but I'm optimistic it can be useful today and thus help justify changing the hardware/driver stack.
The whole point of libc is to paper over the syscall interface. If you start from musl, "syscall" can be a table of function pointers or asm. Glibc is more obstructive. This libc open codes a bunch of things, with a rpc.h file dealing with synchronising memcpy of arguments to/from threads running on the CPU which get to call into the Linux kernel directly. It's mainly carefully placed atomic operations to keep the data accesses well defined.
There's also nothing in here which random GPU devs can't build themselves. The header files are (now) self contained if people would like to use the same mechanism for other functionality and don't want to handroll the data structure. The most subtle part is getting this to work correctly under arbitrary warp divergence on volta. It should be an out of the box thing under openmp early next year too.