|
|
|
|
|
by stoatmagoats
975 days ago
|
|
Friendly internet stranger’s input: - you don’t get GPU acceleration just by using unified memory. Llama.cpp still only uses the CPU on Apple Silicon chips. - the difference in tokens/sec is likely attributable to memory bandwidth. Mac Studios with the base Max chip have 400 GB/s memory bandwidth compared to around 50 GB/s for the Ryzen 5000 series CPUs |
|
[0] https://github.com/ggerganov/llama.cpp#metal-build