|
|
|
|
|
by roenxi
774 days ago
|
|
An interesting alternative question: "how necessary is ROCm when working with APU?". CUDA's advantage seemed to me to come mostly from memory management and task scheduling being so poor on AMD cards. If AMD has engineered that problem out of the system, we might be able to get away with using 3rd party libraries instead of these vendor-promoted frameworks. |
|
In practice if you go down that road on discrete GPU systems, allocating "fine grain" memory so you can talk to the GPU is probably the most tedious part of the setup. I gave up around there. An APU should be indifferent to that though.
There will be some setup to associate your CPU process with the GPU. Permissions style, since Linux doesn't let processes stomp on each other. That might be rather minimal and should be spelled out in roct.
Launching a kernel involves finding the part of the address space the GPU is watching, writing 64 bytes to it and then "ringing a doorbell" which is probably writing to a different magic address. There's a lot of cruft in the API from earlier generations where these things involved a lot of work.
Game plan for finding out goes something like:
Roct is a small C library that implements the userspace side of the kernel driver. I'd be inclined to link it into your application instead of drop it entirely, but ymmv. Rocr / HSA is a larger C++ library that has a lot more moving parts and is more tempting to drop from the dependency graph.Going beyond that, you could build a simplified version of the kernel driver that drops all the other hardware. Might make things better, might not. And beyond that there's the firmware on the GPU which might be getting more accessible soon, but iiuc is written in assembly so might not be that much fun to hack on. And beyond that you're on the silicon, where changing it is making a different chip really.