Hacker News new | ask | show | jobs
by jampekka 1936 days ago
OTOH PyTorch seems to be highly explosive if you try to use it outside the mainstream use (i.e. neural networks). There's sadly no performant autodiff system for general purpose Python. Numba is fine for performance, but does not support autodiff. JAX aims to be sort of general purpose, but in practice it is quite explosive when doing something other than neural networks.

A lot of this is probably due to supporting CPUs and GPUs with the same interface. There are quite profound differences in how CPUs and GPUs are programmed, so the interface tends to restrict especially more "CPU-oriented" approaches.

I have nothing against supporting GPUs (although I think their use is overrated and most people would do fine with CPUs), but Python really needs a general purpose, high performance autodiff.

7 comments

> I have nothing against supporting GPUs (although I think their use is overrated and most people would do fine with CPUs), but Python really needs a general purpose, high performance autodiff.

As someone who works with machine learning models day-to-day (yes, some deep NNs, but also other stuff) - GPUs really seem unbeatable to me for anything gradient-optimization-of-matrices (i.e. like 80% of what I do) related. Even inference in a relatively simple image classification net takes an order of magnitude longer on CPU than GPU on the smallest dataset I'm working with.

Was this a comment about specific models that have a reputation as being more difficult to optimize on the GPU (like tree-based models - although Microsoft is working in this space)? Or am I genuinely missing some optimization techniques that might let me make more use of our CPU compute?

For gradient-optimization-of-matrices for sure. Just make sure that you don't use gradient-optimization-of-matrices just because they run well on GPUs. There may well be more efficient approaches to your problems that are infeasible for the GPUs' wide SIMD architecture you may miss if you tie yourself to GPUs.

In general it's more that some specific models are easy for GPUs. Most models probably are not.

I really don't understand the GPUs are overrated comment. As someone who uses Pytorch a lot and GPU compute almost every day, there is an order of magnitude difference in the speeds involved for most common CUDA / Open-CL accelerated computations.

Pytorch makes it pretty easy to get large GPU accelerated speed-ups with a lot of code we used to traditionally limit to Numpy. And this is for things that have nothing to do with neural-networks.

For a lot of cases you don't really need that much performance. Modern processors are plenty fast. It seems that current push to use GPU also pushes people towards GPU oriented solutions, such as using huge NNs for more or less anything, while other approaches would in many cases be magnitudes more efficient and robust.

GPUs (or "wide SIMDs" more generally) have quite profound limitations. Branching is very limited, recursion is more or less impossible and parallelism is possible only for identical operations. This makes for example many recursion-based time-series methods (e.g. Bayesian filtering) very tricky or practically impossible. From what I gather, running recurrent networks is also tricky and/or hacky on GPU.

GPUs are great for some quite specific, yet quite generally applicable, solutions, like tensor operations etc. But being tied to GPUs' inherent limitations also limits the space of approaches that are feasible to use. And in the long run this can stunt the development of different approaches.

> For a lot of cases you don't really need that much performance. Modern processors are plenty fast. It seems that current push to use GPU also pushes people towards GPU oriented solutions, such as using huge NNs for more or less anything, while other approaches would in many cases be magnitudes more efficient and robust.

for instance?

I still don't get the criticism of Pytorch. If anything, you can get the best of both worlds in many way with their API supporting on GPU and on CPU operations in exactly the same ways.
What do you mean by “seems to be highly explosive”? I have used Pytorch to model many non-dnn things and have not experienced highly explosive behavior. (Could be that I have become too familiar with common footguns though)
I get what you mean by the GPUs are overrated comment, which is that they're thought of as essential in many cases when they're probably not, but in many domains like NLP, GPUs are a hard requirement for getting anything done
Have you tried using Enzyme* on Numba IR?

* https://enzyme.mit.edu

Wait wat, jax and also pytorch is used in a lot more areas then NN's. Jax is even consider to do better in that department in terms on performance then all of julia so wat are u talking about
GP makes a fair point about JAX still requiring a limited subset of Python though (mostly control flow stuff). Also, there's really no in-library way to add new kernels. This doesn't matter for most ML people but is absolutely important in other domains. So Numba/Julia/Fortran are "better in that department in terms on performance" than JAX because the latter doesn't even support said functionality.
> Jax is even consider to do better in that department in terms on performance then all of julia so wat are u talking about

Please provide sources for this claim

> There's sadly no performant autodiff system for general purpose Python.

Like there is for general purpose Julia code? (https://github.com/FluxML/Zygote.jl)

> I have nothing against supporting GPUs (although I think their use is overrated and most people would do fine with CPUs),

Do you run much machine learning code? All those matrix multiplications run a good bit faster on the GPU.