|
|
|
|
|
by ckitching
697 days ago
|
|
I compiled your function with SCALE for gfx1030: .p2align 2 ; -- Begin function _Z15ptx_thread_voteff
.type _Z15ptx_thread_voteff,@function
_Z15ptx_thread_voteff: ; @_Z15ptx_thread_voteff
; %bb.0: ; %entry
s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
s_waitcnt_vscnt null, 0x0
v_cmp_ge_f32_e32 vcc_lo, v0, v1
s_cmp_eq_u32 vcc_lo, -1
s_cselect_b32 s4, -1, 0
v_cndmask_b32_e64 v0, 0, 1, s4
s_setpc_b64 s[30:31]
.Lfunc_end1:
.size _Z15ptx_thread_voteff, .Lfunc_end1-_Z15ptx_thread_voteff
; -- End function
What were the safety concerns you had? This code seems to be something like `return __all_sync(rSq >= rCritSq) ? 1 : 0`, right? |
|
I'm not familiar with AMD enough to know if additional synchronization is needed. ChatGPT recommended adding barriers beyond what that gave, but again, I'm not familiar with AMD commands.