Hacker News new | ask | show | jobs
by babl-yc 288 days ago
You're right there is no way to specifically target the neural engine. You have to use it via CoreML which abstracts away the execution.

If you use Metal / GPU compute shaders it's going to run exclusively on GPU. Some inference libraries like TensorFlow/LiteRT with backend = .gpu use this.

1 comments

Exactly. And most folks are using a framework like llama.cpp which does control where it’s run.