| Short answer: yes. Long answer: If the hypothesis of the parent comment is correct, then the following should be true: joining shader logic paths or parts with very different resource usage, notably register usage, shall no more cause occupancy issues which on traditional architectures could be sometimes resolved by artificially splitting the logic into multiple different kernels with the sole purpose of making it possible for low-resource usage portions of the logic to run with higher occupancy. If the case, it is of course beneficial not only because such splitting burns development hours (and can be potentially error prone), but also because such splitting introduces overhead of its own, for reasons anywhere from having to repeat some of the calculations in more than one separate kernel (instead of reusing the results) through having to store and subsequently reload intermediate values to communicate between parts, to having to pay the overhead of launching and synchronizing additional kernels. It stands to reason, however, this is not going to help in those cases when it's not the occupancy that's the problem, but rather sparse SIMD/wavefront utilization of ALU resources is: in cases when the control flow is sub-wavefront divergent, but splitting the code into multiple kernel launches allows for compactification of SIMDs/wavefronts. Furthermore, the joined shader might still fall a little bit short of expectations anyway because joining code together not always results in the compiler successfully identifying and eliminating redundant or repeated calculations, and/or allocating resources better due to having access to the entire program at once. It's usually the case, but occasionally the opposite happens - sometimes joined code results in compiler's optimizations taking a different path and producing code that is actually worse than it would have been if different parts of the code were compiled separately. The risk of that increases if the frequency with which the shader execution takes specific paths in the program is unusual (statistically not typical where statistics are taken over the whole world of programs across space and time :-) ) and the compiler ends up mispredicting which paths are low- and high-probability. |