| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ribit 1272 days ago
	Note that I was talking about the CPU specifically. GPU on Apple is also more efficient (approx 0.25TFLOPs/watt for M1 series), but Apple GPUs lack support for ML-optimized FP representation (primary reason why Nvidia is so good in this domain). Apple does have a matrix coprocessor which offers excellent performance/watt for inference, but these units are relatively small and only offer limited aggregated performance. I think it’s just a question of time until Apple offers hardware support for BFLOAT and other formats on the GPU and AMX (they already have BFLOAT16 in the CPU), at which point their ML performance will improve dramatically.