| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by std_badalloc 1933 days ago
	PyTorch is the most impressive piece of software engineering that I know of. So yeah, it's a nice interface for writing fast numerical code. And for zero effort you can change between running on CPUs, GPUs and TPUs. There's some compiler functionality in there for kernel fusing and more. Oh, and you can autodiff everything. There's just an incredible amount of complexity being hidden behind behind a very simple interface there, and it just continues to impress me how they've been able to get this so right.

5 comments

sillysaurusx 1932 days ago

and TPUs

BS. There's so much effort getting Pytorch working on TPUs, and at the end of it it's incredibly slow compared to what you have in Tensorflow. I hate this myth and wish it would die.

Old thread on this, detailing exactly why this is true: https://news.ycombinator.com/item?id=24721229

link

jampekka 1933 days ago

OTOH PyTorch seems to be highly explosive if you try to use it outside the mainstream use (i.e. neural networks). There's sadly no performant autodiff system for general purpose Python. Numba is fine for performance, but does not support autodiff. JAX aims to be sort of general purpose, but in practice it is quite explosive when doing something other than neural networks.

A lot of this is probably due to supporting CPUs and GPUs with the same interface. There are quite profound differences in how CPUs and GPUs are programmed, so the interface tends to restrict especially more "CPU-oriented" approaches.

I have nothing against supporting GPUs (although I think their use is overrated and most people would do fine with CPUs), but Python really needs a general purpose, high performance autodiff.

link

wxnx 1932 days ago

> I have nothing against supporting GPUs (although I think their use is overrated and most people would do fine with CPUs), but Python really needs a general purpose, high performance autodiff.

As someone who works with machine learning models day-to-day (yes, some deep NNs, but also other stuff) - GPUs really seem unbeatable to me for anything gradient-optimization-of-matrices (i.e. like 80% of what I do) related. Even inference in a relatively simple image classification net takes an order of magnitude longer on CPU than GPU on the smallest dataset I'm working with.

Was this a comment about specific models that have a reputation as being more difficult to optimize on the GPU (like tree-based models - although Microsoft is working in this space)? Or am I genuinely missing some optimization techniques that might let me make more use of our CPU compute?

link

jampekka 1932 days ago

For gradient-optimization-of-matrices for sure. Just make sure that you don't use gradient-optimization-of-matrices just because they run well on GPUs. There may well be more efficient approaches to your problems that are infeasible for the GPUs' wide SIMD architecture you may miss if you tie yourself to GPUs.

In general it's more that some specific models are easy for GPUs. Most models probably are not.

link

_coveredInBees 1932 days ago

I really don't understand the GPUs are overrated comment. As someone who uses Pytorch a lot and GPU compute almost every day, there is an order of magnitude difference in the speeds involved for most common CUDA / Open-CL accelerated computations.

Pytorch makes it pretty easy to get large GPU accelerated speed-ups with a lot of code we used to traditionally limit to Numpy. And this is for things that have nothing to do with neural-networks.

link

jampekka 1932 days ago

For a lot of cases you don't really need that much performance. Modern processors are plenty fast. It seems that current push to use GPU also pushes people towards GPU oriented solutions, such as using huge NNs for more or less anything, while other approaches would in many cases be magnitudes more efficient and robust.

GPUs (or "wide SIMDs" more generally) have quite profound limitations. Branching is very limited, recursion is more or less impossible and parallelism is possible only for identical operations. This makes for example many recursion-based time-series methods (e.g. Bayesian filtering) very tricky or practically impossible. From what I gather, running recurrent networks is also tricky and/or hacky on GPU.

GPUs are great for some quite specific, yet quite generally applicable, solutions, like tensor operations etc. But being tied to GPUs' inherent limitations also limits the space of approaches that are feasible to use. And in the long run this can stunt the development of different approaches.

link

mpfundstein 1931 days ago

> For a lot of cases you don't really need that much performance. Modern processors are plenty fast. It seems that current push to use GPU also pushes people towards GPU oriented solutions, such as using huge NNs for more or less anything, while other approaches would in many cases be magnitudes more efficient and robust.

for instance?

link

_coveredInBees 1932 days ago

I still don't get the criticism of Pytorch. If anything, you can get the best of both worlds in many way with their API supporting on GPU and on CPU operations in exactly the same ways.

link

ahendriksen 1933 days ago

What do you mean by “seems to be highly explosive”? I have used Pytorch to model many non-dnn things and have not experienced highly explosive behavior. (Could be that I have become too familiar with common footguns though)

link

lgessler 1932 days ago

I get what you mean by the GPUs are overrated comment, which is that they're thought of as essential in many cases when they're probably not, but in many domains like NLP, GPUs are a hard requirement for getting anything done

link

jl2718 1933 days ago

Have you tried using Enzyme* on Numba IR?

* https://enzyme.mit.edu

link

komuher 1933 days ago

Wait wat, jax and also pytorch is used in a lot more areas then NN's. Jax is even consider to do better in that department in terms on performance then all of julia so wat are u talking about

link

BadInformatics 1932 days ago

GP makes a fair point about JAX still requiring a limited subset of Python though (mostly control flow stuff). Also, there's really no in-library way to add new kernels. This doesn't matter for most ML people but is absolutely important in other domains. So Numba/Julia/Fortran are "better in that department in terms on performance" than JAX because the latter doesn't even support said functionality.

link

jpsamaroo 1932 days ago

> Jax is even consider to do better in that department in terms on performance then all of julia so wat are u talking about

Please provide sources for this claim

link

UncleOxidant 1932 days ago

> There's sadly no performant autodiff system for general purpose Python.

Like there is for general purpose Julia code? (https://github.com/FluxML/Zygote.jl)

> I have nothing against supporting GPUs (although I think their use is overrated and most people would do fine with CPUs),

Do you run much machine learning code? All those matrix multiplications run a good bit faster on the GPU.

link

UncleOxidant 1932 days ago

> Oh, and you can autodiff everything.

Well, not everything. Julia's Zygote AD system can autodiff most Julia code (currently with the exception of code that mutates arrays/matrices).

link

mpfundstein 1931 days ago

and you didn't even talk about data and model parallelism. which often just works out of the box

link

thecleaner 1933 days ago

Its python wrappers on top of existing ThTensor library which was already provided by torch. But yes great engineering nonetheless.

link

rrss 1933 days ago

I don't think this is a particularly accurate description of pytorch in 2021. Yeah, the original c++ backend came from torch, but I think most of that has been replaced. AFAIK, all the development of the c++ backend for pytorch over that last several years has been done as part of the pytorch project -it's not just python wrappers at this point.

link

microtonal 1932 days ago

What I like about PyTorch is that most of the functionality is actually available through the C++ API as well, which has 'beta API stability' as they call it. So, there are good bindings for some other languages as well. E.g., I have been using the Rust bindings in a larger project [1], and they have been awesome. A precursor to the project was implemented using Tensorflow, which was a world of pain.

Even things like mixed-precision training are fairly easy to do through the API.

[1] https://github.com/tensordot/syntaxdot

link