|
|
|
|
|
by adrian_b
474 days ago
|
|
The memory bandwidth does double, but in order to exploit it the program must be written and executed with care in the memory placement, taking into account NUMA, so that the cores should access mostly memory attached to the closest memory controller and not memory attached to the other socket. With a badly organized program, the performance can be limited not by the memory bandwidth, which is always exactly double for a dual-socket system, but by the transfers on the inter-socket links. Moreover, your link is about older Intel Xeon Sapphire Rapids CPUs, with inferior memory interfaces and with more quirks in memory optimization. |
|
But where is your data? For llama.cpp? For whatever dual socket CPU system you want. That’s all I am claiming.