|
|
|
|
|
by dragandj
2256 days ago
|
|
What is flawed there? The point of the article is to show that CuPy very often does not accelerate NumPy, especially on consumer-grade hardware. This is something that most users of NumPy/CuPy do not know, and they are led by the docs to think it does. The reason for that is that CuPy is poorly implemented. And CuPy is poorly implemented because it is constrained by what NumPy does, which, in turn, does stuff that is OK on the CPU, and often translates poorly to the GPU. |
|
You're comparing float32 vs float64 computation. I don't need to tell you how much slower DGEMM is vs SGEMM esp. on the GPU (you mention this in the post yourself!).
Numpy does this for precision reasons, and CuPy simply follows its behavior. This is precisely why I noted that the float32 version runs 3x faster on the CPU.
> The reason for that is that CuPy is poorly implemented.
It's a cheap shot to call something 'poorly implemented' when you don't understand what you're benchmarking.