|
|
|
|
|
by int_19h
1153 days ago
|
|
The GPU isn't actually used by llama.cpp. What makes it that much faster is that the workload, either on CPU or on GPU, is very memory-intensive, so it benefits greatly from fast RAM. And Apple is using DDR5 running at very high clock speeds for this shared memory stuff. It's still noticeably slower than GPU, though. |
|