|
|
|
|
|
by adrian_b
473 days ago
|
|
Partial correction, an Epyc CPU with 12 channels has 576 GB/s, i.e. DDR5-6000 x 768 bits. That is 70% of the Apple memory bandwidth, but with possibly much more memory (768 GB in your example). You do not need 2 CPUs. If however you use 2 CPUs, then the memory bandwidth doubles, to 1152 GB/s, exceeding Apple by 40% in memory bandwidth. The cost of the memory would be about the same, by using 16 GB modules, but the MB would be more expensive and the second CPU would add to the price. |
|
The memory bandwidth does not double, I believe. See this random issue for a graph that has single/dual socket measurements, there is essentially no difference: https://github.com/abetlen/llama-cpp-python/issues/1098
Perhaps this is incorrect now, but I also know with 2x 4090s you don’t get higher tokens per second than 1x 4090 with llama.cpp, just more memory capacity.
(All if this only applies to llama.cpp, I have no experience with other software and how memory bandwidth may scale across sockets)