M1 GPU has 2.6TFlops for the GPU (FP32). 1080Ti has 11.3TFlops.
M1 GPU isn't really that powerful, it's comparable to nVidia 760 (from 2013). The M1's Neural Engine does have more kick of course, but the GPU otherwise is nothing superb (other than marketing).
While I agree in general, I do want to point out this is a still a lightweight entry level laptop SoC compared to a desktop GPU you've mentioned. It can also run in fanless mode like in MBA m1.
Even still, 1050 ti uses up around 75w of power (2016) and 760 has a TDP of 170W (2013) while M1 GPU is much less than 10W (at full load, it peaks at 16w in Mac mini for the entire SoC, not just GPU).
It would be interesting to see what Apple does when it scales it up to 75w or more for their own custom desktop GPUs which is rumored in development. However, separate desktop GPU does lose the benefits of UMA that makes M1 fast.
But they created their GPU on world-leading TSMC 5nm. I wonder that M1's GPU perf/watt is mostly came from process advantage. I'd like to see comparison with Kirin 9000. (maybe published by AnandTech)
I mean, my entry deep learning machine was a refurbished Linux box with a pile of gpus I found in the electronics recycling bin at work. Later upgraded to a 1080 and then a 2080, as one does.
The nice thing about the desktop is that it can just train for days and I don't need to work about using it for other things, losing time moving locations, etc. It's also still probably cheaper than the m1 laptop, even with a nicer gpu than what you can find in the trash.
I did the same (well, used a discarded GPU that my kids didn't need any more) and I have one caution: If you're using PyTorch, you'll want to have a CPU that at least supports AVX. The C++ libraries that ship with PyTorch assume AVX at compile time and don't have an option to disable at runtime. It's a PITA to recompile the entire stack.
When it comes to Apple's mobile GPUs (and now M1) their main advantage is that they have a custom architecture and compiler and they're both the result of lots of experts working hard on both and leveraging existing work (in the architecture case, the fact that they started from PowerVR's world-class design, and in the compiler case, the fact that their shader compiler is based on LLVM). They're really well-designed chips. Unfortunately there's no particular reason that they would be better at ML than any other GPU, since they had no reason to consider it in the design process until somewhat recently.
Also, besides memory speed which certainly will be an important factor, it might depend on the model architecture, fp32/fp16 etc. Much like you cannot say CPU 1 is faster than CPU. 2 - it totally depends on the type of workload - same goes for deep learning benchmarks.
Performance in games typically doesn't have much correspondence with how good a GPU will be at compute. Rendering games, videos, etc is largely constrained by things like memory bandwidth, culling, image decoding and blending - all stuff typically done with special-purpose hardware that won't be very useful for compute or neural nets. Sometimes games or media have complex shaders that are entirely limited by compute, but it's not terribly common.
Apple's dedicated ML hardware is probably quite good, but we don't have any way to know how good without doing math on die size + power draw and running benchmarks.
The price doesn't mean much, the 1080 is pretty old at this point and the M1 has considerably smaller transistors. Considering how expensive Apple products are it's quite possible the cost of an M1 chip isn't much lower than that of the core shipped inside the 1080. A lot of what you pay for on a 1080 is cooling, display output, and power delivery.
Indeed, some of the cost of a 1080 at this point may be that the demand is constrained to people that want replacement parts for uniform deployments...
They have comparable TFLOPs range.
1080 mobile is way faster than 1060 (60-90% faster [^1]) and definitely ahead of M1 by a large factor.
[1]: https://gpu.userbenchmark.com/Compare/Nvidia-GTX-1080-Mobile...