|
|
|
|
|
by jfindley
966 days ago
|
|
Clock speed isn't a particularly meaningful measurement anymore, and hasn't been for years. For example, an AMD Genoa chip, depending on SKU, may have fairly comparable base/boost clock speeds compared to an Intel Sapphire Rapids - but in practice the single-core performance of the Intel is going to be substantially better for most code. Your cache question doesn't really have a simple answer either. E.g. an AMD CPU is split into different CCXs. To simplify somewhat, each core is broken up into several smaller compute units, with their own caches and memory controller. Intel has a completely different ring-based approach that's harder to summarise in once sentence. Overall though, for the sort of work you're describing the limiting factor is often memory bandwidth, not raw compute. Different platforms have very different membw/core figures, and I suspect if you started measuring that then you'd find it easier to predict your codes performance. |
|
This can be easily noticed when comparing the base clock frequencies, which are more or less proportional with the actual clock frequencies that will be reached in multi-threaded applications. For instance a 7950X has 4.5 GHz versus the 3.2 GHz of 14900K. Similar differences are between Epyc and Xeon and between Threadripper and Xeon W.
In desktop CPUs Intel can hide their very poor multi-threaded performance by allowing a much higher power consumption. However this method does not work for server and workstation CPUs, because these already have the highest TDP that is possible with the current cooling solutions, so in servers and workstations the bad Intel MT performance is much more visible. Intel hopes that this will change in 2024, when they will launch server and workstation CPUs made with the new Intel 3 CMOS process.
In the absence of actual benchmarks, a good proxy for the multi-threaded performance of a CPU is the product between the base clock frequency and the number of cores. For Intel hybrid CPUs, an E-core should be counted as 0.6 cores. For example a Threadripper 7960X should be expected to be (24 cores x 4.2 GHz) / (16 cores x 4.5 GHz) = 1.4 times faster than a 7950X in multi-threaded applications that are limited by the CPU cores (but twice faster in applications that are limited by the memory throughput).