| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mehmetoguzderin 1571 days ago
	> The 3090 also can do fp16 and the M1 series only supports fp32 Apple Silicon (including base M1) actually has great FP16 support at the hardware level, including conversions. So it is wrong to say it only supports FP32.

2 comments

oneplane 1571 days ago

I'm not sure if he was talking about the ML engine, the ARM cores, the microcode, the library or the OS. But it does indeed have FP16 in the Arm cores.

link

inkyoto 1571 days ago

FP16 is supported in M1 GPU's and Neural Engines through the CoreML framework. From https://coremltools.readme.io/docs/typed-execution :

> The Core ML runtime dynamically partitions the network graph into sections for the Apple Neural Engine (ANE), GPU, and CPU, and each unit executes its section of the network using its native type to maximize its performance and the model’s overall performance. The GPU and ANE use float 16 precision, and the CPU uses float 32.

Also, this exploration (https://tlkh.dev/benchmarking-the-apple-m1-max#heading-neura...) reports the 5.1-5.3 TFLOPS FP16 ballpark performance.

link

apohn 1571 days ago

I should have been more clear. I didn't mean the hardware, but the speedup you get from using mixed precision in something like Tensorflow with an NVIDIA GPU.

link

apohn 1571 days ago

Thanks. At least when I ran the benchmarks with Tensorflow, using mixed precision resulted in the CPU being used for training instead of the GPU on the M1 Pro. So if the hardware is there for fp16 and they will implement the software support for DL frameworks, that will be great.

link

mehmetoguzderin 1571 days ago

Yes, unfortunately, the software is to blame for the time being, and I also ran into issues myself. :\ Hope they catch up to what the hardware delivers well, including both the GPU and the Neural Engine.

link