| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by akssri 2300 days ago

Common Lispers have had the ability to write CUDA kernels (with shaders) on-the-fly with a DSL in cl-cuda for many years now. See,

https://github.com/takagi/cl-cuda

Cupy (a project in which takagi is/was actively involved) also has the ability to compile kernels in the Python REPL, but the kernel code needs to be written in C and passed in as a string (pyOpencl can do the same for OpenCL). In fact, not too many years ago, the kernels for gradient descent, max pooling etc. were all entirely in Python in Chainer. Projects like JAX take this to another level by having the ability to transform the bytecode of a restricted class of Python functions straight into CUDA kernels.

I can't speak much about Clojure/Neanderthal, but I'd advise the people to stop dissing projects they are not familiar with, least of all using silly benchmarks like the above.

1 comments

dragandj 2300 days ago

Good advice. And, yet, you've taken it pretty seriously to diss Clojure/Neanderthal and my blog post, mostly by talking about unrelated stuff and projects that the post didn't even mention. And while introducing these themes left and right you didn't even bother to show some code related to this topic, just a suggestion of great projects by cool people.

Yes, you showed the PyTorch code related to the blog post that confirms what the blog post says, but when I pointed out that the code has incorrect functionality (by missing some calculations) you didn't even bother to correct it, or to confirm that the code is good and that I'm wrong.

So, it seems that your standard is that it is enough for one side to throw bits and pieces around and call it a day, and for the other to run around and prove that their stuff is better than everything that could possibly be done in every technology.

I choose to stick to the theme. The theme is CuPy, NumPy, Clojure & Neanderthal. The related theme could be code in another technology. Great - write about it. But, even if every other technology were a million times better than what I describe in the article, it does not change the fact that CuPy and NumPy have the issue I've described.

link

akssri 2300 days ago

> And, yet, you've taken it pretty seriously to diss Clojure/Neanderthal and my blog post

I have not - all I've said so far is that your benchmark is flawed.

The fact that the code fragment above assumes zero mean data (thus using 2 fewer L1 ops) doesn't change a single thing in anything that has been written; to wit, the timings change to 28.6ms (GPU) and 333 ms (CPU). Pedantry is not an argument.

link

dragandj 2300 days ago

I still don't get how the fact that someone could implement the same thing that I did in Clojure in PyTorch has anything to do with NumPy and CuPy, or my benchmark?

BTW, your PyTorch code is still incorrect (or so it seems to me although I don't use PyTorch so I can't try it on the computer). The formula for correlation requires division by sigma_x * sigma_y (which has dimension n x n), and you are dividing by (sigma_x)^2 (which has dimension n). So you still forgot at least one L2 operation that computes all combinations of sigma_x_y. A couple operations here, a couple operations there, an edge case here, and edge case there, it adds up. That's why people use NumPy/CuPy after all...

link

akssri 2299 days ago

- Function in Cupy takes 29.4ms, Numpy takes 427 ms. Happy ?

- Broadcasting semantics + division takes care of the outer-product normalization. This is 2 L1 ops in size of the matrix & the input (~ xSCAL).

Pedantry is still not an argument.

link

dragandj 2299 days ago

Thank you so much, that's phenomenal news for me! (Since I can make neanderthal code go at 23ms (GPU) and 3XX ms (CPU) when I implement it as NumPy/CuPy/PyTorch does (sans float64 conversion, of course) You saved me from having to fiddle with Python (which I don't particularly enjoy). Thanks again!

Can you please post your implementation of this function, here, so I can try it on my machine and compare it to Neanderthal?

link