Hacker News new | ask | show | jobs
by dpoljak 1272 days ago
The efficiency of Apple silicon is a matter of fact now, however isn't nVidia with its cuda still king in this segment? Please correct me if I'm wrong but doing ml/dl on CPU instead of GPU seems to be the least efficient way to go about it?
3 comments

Note that I was talking about the CPU specifically. GPU on Apple is also more efficient (approx 0.25TFLOPs/watt for M1 series), but Apple GPUs lack support for ML-optimized FP representation (primary reason why Nvidia is so good in this domain). Apple does have a matrix coprocessor which offers excellent performance/watt for inference, but these units are relatively small and only offer limited aggregated performance.

I think it’s just a question of time until Apple offers hardware support for BFLOAT and other formats on the GPU and AMX (they already have BFLOAT16 in the CPU), at which point their ML performance will improve dramatically.

> nVidia with its cuda still king in this segment

I suspect this will gradually change, perhaps especially now a lot of effort has been made to bring tooling such as PyTorch over to Apple silicon.

> on CPU instead of GPU

But Apple isn't doing it on CPU.

You are thinking in terms of x86 discrete components.

Apple Silicon is a fully integrated architecture including unified memory. That's what makes it so efficient.

Yes, Nvidia still obliterates M1/M2 in Deep Learning. M1 is close to GTX1650 in real-world DL workloads though in theory based on TFlops it should be around GTX1070.