Hacker News new | ask | show | jobs
by mehmetoguzderin 1571 days ago
> The 3090 also can do fp16 and the M1 series only supports fp32

Apple Silicon (including base M1) actually has great FP16 support at the hardware level, including conversions. So it is wrong to say it only supports FP32.

2 comments

I'm not sure if he was talking about the ML engine, the ARM cores, the microcode, the library or the OS. But it does indeed have FP16 in the Arm cores.
FP16 is supported in M1 GPU's and Neural Engines through the CoreML framework. From https://coremltools.readme.io/docs/typed-execution :

> The Core ML runtime dynamically partitions the network graph into sections for the Apple Neural Engine (ANE), GPU, and CPU, and each unit executes its section of the network using its native type to maximize its performance and the model’s overall performance. The GPU and ANE use float 16 precision, and the CPU uses float 32.

Also, this exploration (https://tlkh.dev/benchmarking-the-apple-m1-max#heading-neura...) reports the 5.1-5.3 TFLOPS FP16 ballpark performance.

I should have been more clear. I didn't mean the hardware, but the speedup you get from using mixed precision in something like Tensorflow with an NVIDIA GPU.
Thanks. At least when I ran the benchmarks with Tensorflow, using mixed precision resulted in the CPU being used for training instead of the GPU on the M1 Pro. So if the hardware is there for fp16 and they will implement the software support for DL frameworks, that will be great.
Yes, unfortunately, the software is to blame for the time being, and I also ran into issues myself. :\ Hope they catch up to what the hardware delivers well, including both the GPU and the Neural Engine.