On modern processors, floating point addition often has equal performance to floating point multiplication. For example, on AMD Zen4 it’s 3 cycles latency and 0.5 cycles throughput.
I’m not sure that trick going to work in the context of computer graphics. To transform vectors or multiply matrices you need a mix of multiplications and additions, or an equivalent sequence of FMAs.
I’m not sure that trick going to work in the context of computer graphics. To transform vectors or multiply matrices you need a mix of multiplications and additions, or an equivalent sequence of FMAs.