|
|
|
|
|
by akssri
2253 days ago
|
|
> What is /flawed/ there? You're comparing float32 vs float64 computation. I don't need to tell you how much slower DGEMM is vs SGEMM esp. on the GPU (you mention this in the post yourself!). Numpy does this for precision reasons, and CuPy simply follows its behavior. This is precisely why I noted that the float32 version runs 3x faster on the CPU. > The reason for that is that CuPy is poorly implemented. It's a cheap shot to call something 'poorly implemented' when you don't understand what you're benchmarking. |
|
It would be interesting to compare it against a float64 version in Neanderthal as well I agree with that.
That said, a flawed benchmark would mean to me that it isn't indicative of the performance one can expect when actually using the library on real world use case, but for now this benchmark for NumPy/CuPy does seem to be indicative of what you'd expect.
Now, the next question is, for model accuracy vs scale, is going with float64 coercion always the ideal trade off? What if you still needed to squeeze more performance? Is it really a bad idea to do so by going down to float32? Especially considering how much faster GPU can accelerate that?