Hacker News new | ask | show | jobs
by Aurornis 99 days ago
GeekBench probably made the right choice to optimize for more realistic real-world workloads than for the more specific workloads that benefit from really high core counts. GeekBench is supposed to be a proxy for common use case performance.

High core count CPUs are only useful for specific workloads and should not be purchased as general purpose fast CPUs. Unless you’re doing specific tasks that scale by core count, a CPU with fewer cores and higher single threaded throughput would be faster for normal use cases.

The callout against the poor journalism at Tom’s Hardware isn’t something new. They have a couple staff members posting clickbait all the time. Some times the links don’t even work or they have completely wrong claims. This is par for the site now.

To be fair, the Tom’s Hardware article did call out these points and the limitations in the article, so this SlashDot critique is basically repeating the content of the Tom’s Hardware article but more critically https://www.tomshardware.com/pc-components/cpus/apples-18-co...

7 comments

I think this actually concedes the main criticism.

If Geekbench 6 multicore is primarily a proxy for “common use case performance” rather than for workloads that actually use lots of cores, then it shouldn’t be treated as a general multicore CPU benchmark, and it definitely shouldn’t be the basis for sweeping 18-core vs 96-core conclusions.

That may be a perfectly valid design choice. But then the honest takeaway is: GB6 multicore measures a particular class of lightly/moderately threaded shared-task workloads, not broad multicore capability.

The criticism isn’t “every workload should scale linearly to 96 cores.” It’s that a benchmark labeled “multicore” is being used as if it were a general multicore proxy when some of its workloads stop scaling very early, including ones that sound naturally parallelizable.

Geekbench 6 isn't really marketed as a one-size-fits-all benchmark. It's specifically aimed at consumer hardware. The first paragraph on geekbench.com reads:

> Geekbench 6 is a cross-platform benchmark that measures your system's performance with the press of a button. How will your mobile device or desktop computer perform when push comes to crunch? How will it compare to the newest devices on the market? Find out today with Geekbench 6.

And further down,

> Includes updated CPU workloads and new Compute workloads that model real-world tasks and applications. Geekbench is a benchmark that reflects what actual users face on their mobile devices and personal computers.

The problem is, in practice, despite nonspecific marketing language, people do use the multicore benchmark to measure multicore performance. Including for things like Threadripper, which is not exactly an exotic science project CPU or non-personal or non-desktop.
> Including for things like Threadripper, which is not exactly an exotic science project CPU or non-personal or non-desktop.

We're talking about a CPU with a list price over $10000.

Geekbench 6 is a bad test to use to assess the suitability of a 96-core Threadripper for the kinds of use cases where buying a 96-core Threadripper might make sense. But Geekbench 6 does a very good job of illustrating the point that buying a 96-core Threadripper would be a stupid waste of money for a personal desktop and the typical use cases of a personal desktop.

Holy hell. Lol. I did not realize how generous $PREVIOUS_EMPLOYER was.
> then it shouldn’t be treated as a general multicore CPU benchmark,

It is a general multi core benchmark for its target audience.

It’s not marketed as “the multi core scaling benchmark”. Geekbench is advertised as a benchmark suite and it has options to run everything limited to a single core or to let it use as many cores as it can.

96-core CPUs are not its target audience.

Geekbench 6 Multi-Core is fundamentally a single-task benchmark. It measures performance in workloads, where the user is not running anything significant in the background. If you are a developer who wants to continue using the computer while compiling a large project in the background, Geekbench results are not particularly informative for you.

I've personally found that Apple's Pro/Max chips have already too many CPU cores for Geekbench.

As an owner of a 96 core 9995wx, nobody is buying one for desktop PC much less laptop level software.

To justify the investment you need to have tasks that scale out, or loads of heterogeneous tasks to support concurrently.

What tasks are you running on your 96 core 9995wx?
LLVM developer compiling the full LLVM stack every 10 minutes.
Make -j97 presumably. Or MPI jobs.
Right, this is a car-priced CPU and the only rational reason to have one is that you can exploit it for profit. One pretty great reason would be giving it to your expensive software developers so they don't sit there waiting on compilers.
I’ll push back and say there are people who buy it for desktop but primarily for workstation like uses such as simulations.

A ton of my FX artist friends have specced out their home rigs with one or something in its orbit.

>"High core count CPUs are only useful for specific workloads and should not be purchased as general purpose fast CPUs. Unless you’re doing specific tasks that scale by core count, a CPU with fewer cores and higher single threaded throughput would be faster for normal use cases."

I design multithreaded backends that benefit from as many cores as possible while not being a champion in a single core task. I think this is very common use case.

Maybe I’m misunderstand what you’re saying, but designing multithreaded backends is not a very common use case.

Most computer use cases don’t involve software development at all.

Running those backends is very common. Just not in one's house / apartment
Buried in the middle of that article:

> Furthermore, many of the suite’s multi-threaded subtests scale efficiently only to roughly 8 – 32 threads, which leaves much of such CPUs' parallel capacity idle, but which creates an almost perfect environment for Apple's CPUs that feature a relatively modest number of cores

Invalidates the entire comparison really, and should have canned the article if they had any integrity.

AMD has 16 cores, Apple has 18, Qualcomm has 18, Nvidia N1X has 20, and Intel has 24. All else being equal you actually want as few cores as you can get away with because that's less likely to be limited by Amdahl's Law. Arguably Intel/Nvidia CPUs are poorly designed and benchmarks have no obligation to accommodate them.

(I'm not counting high-end workstation/server CPUs because, as others in this thread have explained, Geekbench isn't intended for them.)

> (I'm not counting high-end workstation/server CPUs because, as others in this thread have explained, Geekbench isn't intended for them.)

Yeah but that's the thing, the article is both in the headline and contents of the article comparing a CPU intended for high-end workstations with a consumer CPU meant for a laptop, and using software explicitly not designed for them. That's the issue here!

GeekBench being questionable aside, these results have me stoked to see what an M5 Ultra looks like performance-wise!
Not sure you opened the blog post. The scaling is atrocious, even for tasks that should be extremely parallelizable. The Geekbench "Text Processing" benchmark supposedly processes 190 markdown files, and yet it tops at just 1.34x the single-thread performance when you have 4 cores, and it drops with more cores! I admit my expertise is algorithms & optimization so I may get more easily incensed by inept developers, but this is crazy... It is not realistic in any way, unless we assume the "real world" is just js beginners scribbling code for a website...
The only reason for a multicore benchmark is when the benchmark represents some common task that is not embarrassingly parallel. If your multicore benchmark is just a single threaded test run on a bunch of cores, it’s pointless. I can simply do math to find that result, max(single core performance multiplied by the number of cores, memory bandwidth divide by bandwidth required per thread).

A good benchmark will be something people actually do.