Hacker News new | ask | show | jobs
by dragandj 2253 days ago
This is a reply to bearzoo, but we reached the depth level of the comment thread.

>> Nope, we work with float32.

>It seems that is not true.

Well, it is true. We work with float32. I explicitly checked that and NumPy/CuPy answered that the array is indeed float32. The fact, which I didn't know then, is that NumPy/CuPy internally decides on its own to use float64 without warning or possibility for us to order it not to do that. But, it is not I (or "us") that work with float64.

I would say that your comment would stand if I mistakenly ordered NumPy/CuPy to use float64 while claiming that I work with float32 (which could happen hypothetically if there was a non-obvious, but documented option to use float32 that I missed).

1 comments

Yeah, you passed in float32. The function always promotes to float64 instead. You can argue that NumPy/CuPy should provide a purely float32 option, instead you went something like “look at how my expf beats their exp in benchmarks! What are you talking about ‘not the same function’, I made sure to pass float not double!” (a libc example to illustrate the point). And it’s not clear to me you realized the difference before akssri pointed it out, which renders the benchmarks pretty meaningless.
Where did you get that? Even the main title is refferring to the main point being that CuPy does not accelerate NumPy even in the case where it should be absolutely expected to. Then I used my implemetation to demonstrate that indeed GPU implementation for such a huge matrix should be many times faster.

I never claimed that my library aims for being a replacement for CuPy, or to have any compatibility with NumPy.

It would be more valuable to CuPy developers if I debugged CuPy to discover why that problem exists, but why should I be obligated to? I was writing this for a perspective of a user of these libraries.

Would it be an idea to benchmark Neanderthal with float64 as well, just to gather some data on it?

I agree with both of you, you’re both looking at this thing from a different perspective. It’s perhaps just better to gather timing measurements on a few variants with the trade-offs that each library has made, and how that affects implementation / speed.