The fact is that a branch that the whole warp takes or does not take is relatively cheap on modern hw. Even if it is per thread dependent.