|
|
|
|
|
by h0l0cube
1238 days ago
|
|
Can you be more specific? Dot product is about as performant as it gets with linear memory access and SIMD multiply accumulate. Throw random memory access and flow control in there and it’s a struggle to do it faster. Unless the factors are sparse, in which case just elide the zero values. |
|