|
|
|
|
|
by anentropic
1159 days ago
|
|
I saw this: https://github.com/jankais3r/LLaMA_MPS it runs slightly slower on the GPU than under llama.cpp but uses much less power doing so I would guess the slowness is due to immaturity of the PyTorch MPS backend, the asitop graphs show it doing a bunch of cpu along with the gpu, so it might be inefficiently falling back to cpu for some ops and swapping layers back and forth (I have no idea, just guessing) |
|