Hacker News new | ask | show | jobs
by simonw 1153 days ago
My understanding is that part of it is that Apple Silicon shares all available RAM between CPU and GPU.

I'm not sure how many of these models are actively taking advantage of that architecture yet though.

1 comments

The GPU isn't actually used by llama.cpp. What makes it that much faster is that the workload, either on CPU or on GPU, is very memory-intensive, so it benefits greatly from fast RAM. And Apple is using DDR5 running at very high clock speeds for this shared memory stuff.

It's still noticeably slower than GPU, though.