|
|
|
|
|
by torrance
1483 days ago
|
|
Develop against CUDA locally. Port my kernels to ROCm, and occupy a whole HPC node for debugging and performance tuning for a week. It’s terrible. Edit: I should say that their recommendation is to write the kernels in ‘hip’ which is supposed to be their cross device wrapper for both cuda or ROCm. I’m writing in Julia however so that’s not possible. |
|