Hacker News new | ask | show | jobs
by munro 1491 days ago
That /sounds/ right, but training still has a forward part, so OP does raise a really great question. And looking at the silicon, the neural engine is almost the size of the GPU. Really need someone educated in this area to chime in :)
2 comments

You have to stash more information from the forward pass in order to calculate the gradients during backprop. You can't just naively use an inference accelerator as part of training - inference-only gets to discard intermediate activations immediately.

(Also, many inference accelerators use lower precision than you do when training)

There are tricks you can do to use inference to accelerate training, such as one we developed to focus on likely-poorly-performing examples: https://arxiv.org/abs/1910.00762

The neural engine is only exposed through a CoreML inference API.

You can't even poke the ANE hardware directly from a regular process. The interface for accessing the neural engine is not hardened (you can easily crash the machine from it).

So the matter is essentially moot in practice as you'd need your users to run with SIP off...

That doesn't seem to be a huge issue. If someone actually does this for income, would they avoid disabling sip for 2x performance gain for example?
Sounds like you've you done a bit of digging around, you're efforts are appreciated. I found and a github of people sharing what they know, here's a guy live streaming hacking it and building a tinygrad https://youtu.be/mwmke957ki4