Hacker News new | ask | show | jobs
by anentropic 1159 days ago
I saw this: https://github.com/jankais3r/LLaMA_MPS

it runs slightly slower on the GPU than under llama.cpp but uses much less power doing so

I would guess the slowness is due to immaturity of the PyTorch MPS backend, the asitop graphs show it doing a bunch of cpu along with the gpu, so it might be inefficiently falling back to cpu for some ops and swapping layers back and forth (I have no idea, just guessing)

1 comments

Hey, thanks so much. That solidifies the case for 128gb mac studio. Apple could be selling a bunch of these things with these high ram capabilities.