| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by llm_nerd 1107 days ago

I certainly can't speak to your specific uses or issues, but I mean we've really moved the goalposts from the prior claim that it didn't have tensor (e.g. matrix) functionality.

My daily work life includes a lot of model running on Apple hardware (Apple Silicon and A1# chips with the neural engine) using CoreML, often Pytorch models converted using coremltools. The performance of the Apple chips is spectacular if the intrinsics are supported (things obviously get dicier if there are currently unsupported ops). I mean, the memory bandwidth of the M2 Ultra is within spitting distance of the GDDR6X 4090.

People aren't going to be replacing H100 arrays with Apple Silicon and even as a fan I use nvidia hardware for training and convert the models to CoreML after the fact, but Apple clearly isn't just satisfied being some toy. They are continually climbing up that vine.

1 comments

bufo 1107 days ago

Yes, you are correct in that the ANE does have the equivalent of tensor cores and that I didn’t mention that. I just don’t expect it to be usable beyond inference because the number of compute units will not work for batches in medium/large/huge networks. That’s obviously by design! The ANE silicon size is tiny compared to the GPU area. I wouldn’t be actually surprised if Apple strategically only invests in using their GPU for LLM (1B+ params) work.

Note that if you are currently using CoreML for LLMs all the work is done in the GPU.