|
Ah I was reading the 'deeper dive' section on my phone and missed it was a comparison, not a warning, thank you I'm curious how something like this example would translate: === Mapping lower-level ptx patterns to higher-level AMD constructs like __ballot, and knowing it's safe ``` #ifdef INLINEPTX
inline uint ptx_thread_vote(float rSq, float rCritSq) {
uint result = 0;
asm("{\n\t"
".reg .pred cond, out;\n\t"
"setp.ge.f32 cond, %1, %2;\n\t"
"vote.sync.all.pred out, cond, 0xffffffff;\n\t"
"selp.u32 %0, 1, 0, out;\n\t"
"}\n\t"
: "=r"(result)
: "f"(rSq), "f"(rCritSq));
return result;
}
#endif
```=== Again, I'm guessing there might be an equiv simpler program involving AMD's __ballot, but I'm unsure of the true equivalence wrt safety, and it seems like a tricky rewrite as it needs to (afaict) decompile to recover the higher-level abstraction. Normally it's easier to compile down or sideways (translate), and it's not clear to me these primitives are 1:1 for safely doing so. === FWIW, this is all pretty cool. We stay away from PTX -- most of our app code is higher-level, whether RAPIDS (GPU dataframes, GPU ML, etc libs), minimal cuda, and minimal opencl, with only small traces of inline ptx. So more realistically, if we had the motivation, we'd likely explore just #ifdef'ing it with something predictable. |