|
|
|
|
|
by buildbot
473 days ago
|
|
Ah, I didn’t realize they’d upped the memory bandwidth to DDR5-6000 (vs 4800), thanks for the correction! The memory bandwidth does not double, I believe. See this random issue for a graph that has single/dual socket measurements, there is essentially no difference: https://github.com/abetlen/llama-cpp-python/issues/1098 Perhaps this is incorrect now, but I also know with 2x 4090s you don’t get higher tokens per second than 1x 4090 with llama.cpp, just more memory capacity. (All if this only applies to llama.cpp, I have no experience with other software and how memory bandwidth may scale across sockets) |
|
With a badly organized program, the performance can be limited not by the memory bandwidth, which is always exactly double for a dual-socket system, but by the transfers on the inter-socket links.
Moreover, your link is about older Intel Xeon Sapphire Rapids CPUs, with inferior memory interfaces and with more quirks in memory optimization.