Hacker News new | ask | show | jobs
by PeterStuer 316 days ago
Does your inference framework target the NPU or just GPU/CPU?
1 comments

It's linking llama.cpp and using Metal, so I presume GPU/CPU only.

I'm more than a bit overwhelmed with what I've gotten on my plate and have completely missed the boat on ex. understanding what MLX is, really curious for a thought dump if you have some opinionated experience/thoughts here. (ex. never crossed my mind until now that you might get better results on the NPU than GPU)

LMstudio seems to have MLX support on Apple silicon so you could quickly have a feel for whether it helps in your case https://github.com/lmstudio-ai/mlx-engine