| I need something like this for a switch() command (technically a list of arbitrary functions). Sort of like up to N branches in one step. The idea is to take some number of inputs A, B, C, ... and conceptually perform all of the possible functions simultaneously, then keep the one that's desired and throw the rest away. For any arbitrary logic. Ideally using fewer operations than all of that, but that's optional. Driven by one variable, it would look like: // branch version
switch(var1)
{
case 1:
var4 = var2 + var3;
break;
case 2:
var4 = var2 - var3;
break;
case 3:
var4 = var2 * var3;
break;
case 4:
var4 = var2 / var3;
break;
// ...
default:
var4 = 0; // optional and arbitrary
break;
}
// branchless version
var4 = magic(var1, var2, var3);
I don't know how to do this outside of programmable hardware like an FPGA. The problem is that it's extremely challenging to write/solve the formulas that map ordinary arithmetic and bitwise functions to the larger set of functions.So for now, I may just use a switch() command in a shader and let it figure out any reusable logic internally. I don't know if shaders have come far enough to allow that performantly, or if they end up just calculating everything and throwing away the paths not taken. Which would suggest that the max speed would be O(1/N) for the number of cases. Does anyone know? Maybe truth tables? Or a SAT solver? I'm also not sure how this would work for floating point, but that's optional too. Edit: I updated the cases to show how var1 controls the math performed on var2, var3 and var4. |
Eg. If we had just 1 or 2 as options for the case statements note that test1 = xor(var1 & 0x1, var1>>1 & 0x1) & var1 will, with integer rounding, equal ‘1’ if var1 is one or ‘0’ if var1 is 2,3 or 4. (If it goes to 5 you need another xor at the next highest bit).
Likewise test2 = xor(var1 & 0x1, var1>>1 & 0x1) & var1<<1 equals ‘0’ if var1 is 1, 3 or 4 but it equals ‘1’ if it’s two.
So (test1 * (value on var1 being 1) + (test2 * (value on var1 being 2)) will get you a branchless version for the simple either 1 or 2 case. This will pretty much always be much slower than the branching version but some types of systems out there don’t really do branching and are linear. This type of zeroing trick is really common on matrix multipliers.
Essentially your creating binary logic gates that give 0 or 1 for the exact value you want and blindly multiplying that by the value it would have produced if 1.