Hacker News new | ask | show | jobs
by fooblaster 244 days ago
MLX doesn't use the neural engine still right? I still wish they would abandon that unit and just center everything around metal and tensor units on the GPU.
3 comments

MLX is a training/research framework, and the work product is usually a CoreML model. A CoreML model will use any and all resources that are available to it, at least if the resource fits for the need.

The ANE is for very low power, very specific inference tasks. There is no universe where Apple abandons it, and it's super weird how much anti-ANE rhetoric there is on this site, as if there can only be one tool for an infinite selection of needs. The ANE is how your iPhone extracts every bit of text from images and subject matter information from photos with little fanfare or heat, or without destroying your battery, among many other uses. It is extremely useful for what it does.

>tensor units on the GPU

The M5 / A19 Pro are the first chips with so-called tensor units. e.g. matmul on the GPU. The ANE used to be the only tensor-like thing on the system, albeit as mentioned designed to be super efficient and for very specific purposes. That doesn't mean Apple is going to abandon the ANE, and instead they made it faster and more capable again.

> ...and it's super weird how much anti-ANE rhetoric there is on this site, as if there can only be one tool for an infinite selection of needs

That seems like a strange comment. I've remarked in this thread (and other threads on this site) about what's known re: low-level ANE capabilities, and it seems to have significant potential overall, even for some part of LLM processing. I'm not expecting it to be best-in-class at everything, though. Just like most other NPUs that are also showing up on recent laptop hardware.

> the work product is usually a CoreML model.

What work product? Who is running models on Apple hardware in prod?

An enormous number of people and products. I'm actually not sure if your comment is serious, because it seems to be of the "I don't, therefore no one does" variety.
Enormous compared to what? Do you have any numbers, or are you going off what your X/Bluesky feed is telling you?
I'm super not interested in arguing with the peanut gallery (meaning people who don't know the platform but feel that they have absolute knowledge of it), but enough people have apps with CoreML models in them, running across a billion or so devices. Some of those models were developed or migrated with MLX.

You don't have to believe this. I could not care less if you don't.

Have a great day.

I don't believe it. MLX is a proprietary model format and usually the last to get supported on Huggingface. Given that most iOS users aren't selecting their own models, I genuinely don't think your conjecture adds up. The majority of people are likely using safetensors and GGUF, not MLX.

If you had a source to cite then it would remove all doubt pretty quickly here. But your assumptions don't seem to align with how iOS users actually use their phone.

Any iPhone or iPad app that does local ML inference?
Yes please tell us which apps those are
Wand, Polycam, smart comic reader, Photos of course. Those are just the ones on my phone, probably many more.
The keyboard. Or any of the features in Photos.app that do classification on-device.
Oh, I overlooked that! You are right. Surprising… since Apple has shown that it’s possible through CoreML (https://github.com/apple/ml-ane-transformers)

I would hope that the Foundation Models (https://developer.apple.com/documentation/foundationmodels) use the neural engine.

The neural engine not having a native programming model makes it effectively a dead end for external model development. It seems like a legacy unit that was designed for cnns with limited receptive fields, and just isn't programmable enough to be useful for the total set of models and their operators available today.
That's sadly true, over in x86 land things don't look much better in my opinion. The corresponding accelerators on modern Intel and AMD CPUs (the "Copilot PCs") are very difficult to program as well. I would love to read a blog post on someone trying though!
I have a lot of the details there. Suffice to say it's a nightmare:

https://www.google.com/url?sa=t&source=web&rct=j&opi=8997844...

AMD is likely to back away from this IP relatively soon.

Edit: Foundation Models use the Neural Engine. They are referring to a Neural Engine compatible K/V cache in this announcement: https://machinelearning.apple.com/research/introducing-apple...
Wrt. language models/transformers, the neural engine/NPU is still potentially useful for the pre-processing step, which is generally compute-limited. For token generation you need memory bandwidth so GPU compute with neural/tensor accelerators is preferable.
I think I'd still rather have the hardware area put into tensor cores for the GPU instead of this unit that's only programmable with onnx.