Geekbench multi falls off a cliff after ~16 cores. E.g. the Epyc 9654 with 96 cores benches lower than the Ryzen 7950X with 16 cores of the same generation.
A 1.6x single core performance difference won't negate a 6x core count advantage in peak multi core performance. The problem above is not that the cores are slower, it's that Geekbench will literally not utilize the additional cores in the first place. This compounds with what you're saying - the few cores that do get used have high clocks on the low core count optimized part but lower clocks on the high core count optimized part.
Compare this to a multithreaded benchmark that does scale to all of the cores and you'll find the higher core count CPUs are able to push significantly higher scores despite the single thread difference e.g. https://www.cpubenchmark.net/high_end_cpus.html has them at 62,711 vs 117,317 in the opposite ranking direction. That should feel about right, otherwise AMD would only make the 16 core high frequency CPUs instead of 128 core low frequency monsters.
That's not to say the Geekbench score is bad or useless. It represents a specific type of workload... just not "peak multi core performance". It's more indicative of "mixed workload performance", where the extra 2x cores on the Ultra are more apparently going to be irrelevant.
> Having such a large difference in single-core performance, will negate the sizable difference in total core count (96 vs 16).
But why? Wouldn’t total score be approximately corecount*corescore? Of course it’s not exactly that because not all cores run full speed at the same time, but how are the cores weighted that 16 cores are better than 96 cores with half the speed each?
Geekbench 5 and earlier constructed the multi-core test as essentially running N independent copies of the single-core test. This effectively pretends that every subtest is embarrassingly parallel. Geekbench 6 switched to having the multi-core test actually operate like real multi-threaded software: a fixed-size problem is broken up to be divided among available cores, with a non-zero amount of coordination between threads and potential for less than perfectly linear scaling because Amdahl's Law isn't being ignored.
But that’s a very specific thing to adjust a benchmark for, what if I want to host 50 VMs on one server for example? Then the 100 core server would be much better than the 16 core server, even though it has a lower benchmark value.
It's not a "very specific" thing to adjust a benchmark for. It's the default case for practically all consumer workloads, and Geekbench is a consumer-focused benchmark.
I played around a bit lately with finding ways to dramatically multithread code in golang, mostly for fun. What I found was that there was a threshold where the overhead of spinning up all the threads at the start and synchroninzing them at the end overwhelmed the time savings from actually performing the work in multiple threads.
It wouldn't surprise me if PDF renderer and background blur were fast enough tasks that spinning up 96 threads to split rendering across all those cores was a waste of time compared to how fast the actual task was to complete. It was akin to trying to hammer in 50 nails by getting 50 people and handing out 50 hammers and assigning each person one nail, then telling them "okay, start!", then inspecting everyone's work afterwards; at some point, it's faster just to break it into two or three tasks.
This was a surprise for me as well. I have two EPYC 9754 in a dual socket server, so 256 cores, and the test did not perform as well as I expected it too. It didn’t even load up all the cores, which is what I was needing to do.
I ended up using something else to generate the load I needed, but I can’t remember exactly what. I think it might have been a Monero benchmarking tool?
That's one way to view it. Another is the benchmark doesn't intend to measure the "peak multi-core CPU performance" the article assumes multi-core score is meant to measure. It's really measuring something more like mixed workload performance.
As a performance metric that does seem like it would be more valuable for lots of use cases, so measuring that seems good.
Maybe they need to add an additional “cpu bound multiprocessing perf”, and make it easier for professional tech reporters to understand complex concepts like benchmark numbers :D (in fairness to the reporters it does sound like the benchmark name legitimately implies that it’s a max parallel throughput benchmark, but if this is your job you should really know what your benchmarks are actually measuring).
Honestly a benchmark I would like to - which is more of a software/kernel/os/scheduling one - is “how responsive is this machine under heavy load”.
A non zero part of wanting that as part of a benchmark is that popular benchmarks often seem to be the only way to get companies to fix “uncommon” issues.
--
https://browser.geekbench.com/processors/amd-epyc-9654
https://browser.geekbench.com/processors/amd-ryzen-9-7950x