|
|
|
|
|
by akssri
2252 days ago
|
|
- Function in Cupy takes 29.4ms, Numpy takes 427 ms. Happy ? - Broadcasting semantics + division takes care of the outer-product normalization.
This is 2 L1 ops in size of the matrix & the input (~ xSCAL). Pedantry is still not an argument. |
|
Can you please post your implementation of this function, here, so I can try it on my machine and compare it to Neanderthal?