| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by noduerme 1199 days ago
	Does the llama code that dropped leverage the GPU at all? On an M1 it appears to just run on as many CPU cores as you want to throw at it. The 65B heats up 8 cores real nicely, and it's slow, but I imagine it would be a lot faster on the GPU.

2 comments

Tostino 1199 days ago

I've seen people saying that limiting it to 4 cores out of the 8 total can actually lead to improved performance. Have you seen that?

link

noduerme 1199 days ago

8 starts and runs a bit faster for me if plugged in and before the fan kicks on and the CPU starts throttling. Once that happens it's probably better to stick with 4.

link

brianjking 1198 days ago

All of the llama implementations for Apple are CPU only afaik.

link