|
|
|
|
|
by slavik81
2235 days ago
|
|
I'm actually rather bad at this, but my understanding is that the horizontal operations are relatively slow. The easiest way to get throughput out of SIMD is to have a structure representing 4 points, with a vec4 of your X values, a vec4 of your Y values, and a vec4 of your Z values. Then you can do 4 dot products easily and efficiently, using only a handful of vertical packed instructions. (If figuring out how to effectively use a structure like that sounds difficult and annoying, that's because it is.) |
|