Hacker News new | ask | show | jobs
by vessenes 1152 days ago
Yes. Llama.cpp uses the CPU to do inference. MPS is the GPU for the macbook, so it has highly performant cores which can be used to do the computation. When you get inference done on the GPU, there's no (less?) energy wasted on general compute type work. :)