Hacker News new | ask | show | jobs
by dragandj 2253 days ago
Good advice. And, yet, you've taken it pretty seriously to diss Clojure/Neanderthal and my blog post, mostly by talking about unrelated stuff and projects that the post didn't even mention. And while introducing these themes left and right you didn't even bother to show some code related to this topic, just a suggestion of great projects by cool people.

Yes, you showed the PyTorch code related to the blog post that confirms what the blog post says, but when I pointed out that the code has incorrect functionality (by missing some calculations) you didn't even bother to correct it, or to confirm that the code is good and that I'm wrong.

So, it seems that your standard is that it is enough for one side to throw bits and pieces around and call it a day, and for the other to run around and prove that their stuff is better than everything that could possibly be done in every technology.

I choose to stick to the theme. The theme is CuPy, NumPy, Clojure & Neanderthal. The related theme could be code in another technology. Great - write about it. But, even if every other technology were a million times better than what I describe in the article, it does not change the fact that CuPy and NumPy have the issue I've described.

1 comments

> And, yet, you've taken it pretty seriously to diss Clojure/Neanderthal and my blog post

I have not - all I've said so far is that your benchmark is flawed.

The fact that the code fragment above assumes zero mean data (thus using 2 fewer L1 ops) doesn't change a single thing in anything that has been written; to wit, the timings change to 28.6ms (GPU) and 333 ms (CPU). Pedantry is not an argument.

I still don't get how the fact that someone could implement the same thing that I did in Clojure in PyTorch has anything to do with NumPy and CuPy, or my benchmark?

BTW, your PyTorch code is still incorrect (or so it seems to me although I don't use PyTorch so I can't try it on the computer). The formula for correlation requires division by sigma_x * sigma_y (which has dimension n x n), and you are dividing by (sigma_x)^2 (which has dimension n). So you still forgot at least one L2 operation that computes all combinations of sigma_x_y. A couple operations here, a couple operations there, an edge case here, and edge case there, it adds up. That's why people use NumPy/CuPy after all...

- Function in Cupy takes 29.4ms, Numpy takes 427 ms. Happy ?

- Broadcasting semantics + division takes care of the outer-product normalization. This is 2 L1 ops in size of the matrix & the input (~ xSCAL).

Pedantry is still not an argument.

Thank you so much, that's phenomenal news for me! (Since I can make neanderthal code go at 23ms (GPU) and 3XX ms (CPU) when I implement it as NumPy/CuPy/PyTorch does (sans float64 conversion, of course) You saved me from having to fiddle with Python (which I don't particularly enjoy). Thanks again!

Can you please post your implementation of this function, here, so I can try it on my machine and compare it to Neanderthal?