Hacker News new | ask | show | jobs
by noduerme 1199 days ago
Does the llama code that dropped leverage the GPU at all? On an M1 it appears to just run on as many CPU cores as you want to throw at it. The 65B heats up 8 cores real nicely, and it's slow, but I imagine it would be a lot faster on the GPU.
2 comments

I've seen people saying that limiting it to 4 cores out of the 8 total can actually lead to improved performance. Have you seen that?
8 starts and runs a bit faster for me if plugged in and before the fan kicks on and the CPU starts throttling. Once that happens it's probably better to stick with 4.
All of the llama implementations for Apple are CPU only afaik.