| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by impure-aqua 476 days ago

I think that is much too hand-wavy regarding the performance differences.

Both Passmark and Geekbench are aggregates of a variety of tasks. If you dig into the individual tests that constitute this aggregate score, you will find different platforms perform better, or worse, on certain tests than others. I would wager that, for many applications, only a subset of these tasks are relevant to the performance of the application, yet such benchmark suites distil out all nuance into a single value.

Here is a personal anecdote. I have tried running CASTEP (built from source), a density functional theory calculator, on both an M1 Max MacBook Pro [0], and on a Ryzen 7840HS Lenovo laptop [1]. A cursory glance at those Geekbench results linked might make you expect that the performance is roughly equivalent, but the Ryzen outperforms the Mac by about 4x, a huge difference.

What happens if we try and dig into any particular benchmark to explain this? If you click on any particular benchmark in the Geekbench search lists, you will see they test things like "File Compression", "HTML5 Browser", "Clang". Which of these maps most closely to the sorts of instructions used in CASTEP? Your guess is as good as mine.

If anything, I would say Passmark is quite a bit less abstract about this. Looking at the Mac [2] and Ryzen [3] Passmark results, you can see the Ryzen outperforms the Mac by about 2x on "extended instructions", which appear to involve some matrix math, and also about 2x on "integer math". The Mac, meanwhile, appears to be extremely good at finding prime numbers, at over 3x the speed of the Ryzen. Presumably the Ryzen's balance of instruction performance is more useful for DFT calculations than the Mac's, which perhaps is weaker in areas that might matter for this application, but stronger in areas that might matter for others.

Of course, optimization is likely a component of this. How much effort is put into the OpenBLAS, MPI, etc, implementations on aarch64 darwin vs. x86-64 linux? This is a good question. It is, however, mostly irrelevant to the end consumer, who wishes to consume this software for use in their further research, rather than dig into high-performance computing library optimization.

[0] https://browser.geekbench.com/search?q=7840hs

[1] https://browser.geekbench.com/search?q=m1+max

[2] https://www.cpubenchmark.net/cpu.php?cpu=Apple+M1+Max+10+Cor...

[3] https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+7+PRO+784...

2 comments

seec 476 days ago

This is my experience as well. Geekbench heavily favors the type of workload that runs best on Apple hardware (those tends to be general case, most likely to be used by the mass) but in practice if you have complex software to run your experience will not match the bench numbers.

I think PassMark is more honest as well, because it just gives scores for calculation throughput instead of specific tasks. It more closely matches what experience you will get if you have a varied load.

But since it's Apple we are talking about, their users just want to think they have the best and that's all that matters.

link

krunkcoin 473 days ago

PassMark is "more honest"? It represents a varied load??? No, sorry, it's just not good. Seriously, read their own documentation.

https://www.cpubenchmark.net/cpu_test_info.html

Right from the top it's amateurish stuff: their idea of an integer benchmark to measure "raw" CPU throughput (whatever that means) is to make a bunch of random ints and add/subtract/multiply/divide them.

Very few programs do a high volume of either integer multiply or divide. And when they do, they generally aren't doing it on random numbers. This is the kind of thing which gives synthetic benchmarks their highly deserved bad rep. It might be even worse than Dhrystone MIPs, and believe me, in benchmark nerd circles, that is a fucking diss.

If you look up Geekbench's docs, you'll find that it's all about real-world compute tasks. For example, one of the int tests in their suite is to compile a reference program with the Clang compiler. Compilers are a reasonably good litmus test of integer performance; they heavily stress the CPU features most responsible for high integer performance in this day and age. (Branch prediction, memory prefetching, out-of-order execution, speculation, that kind of thing.)

You claimed that PassMark reflects "complex" software, and Geekbench doesn't. However, I would be willing to bet that Clang alone is far more complex than all of PassMark's CPU benchmarks put together, whether you measure by SLOC or program structure.

Note that none of this has anything to do with Mac vs PC. Passmark is simply a bad benchmark that should not be used, period. That said, there are a bunch of warning signs that PassMark's ports to everything outside its native x86 Windows are probably quite sloppy, so it's even less useful for crossplatform comparisons.

link

aurareturn 476 days ago

Geekbench correlates with SPEC, the industry standard in CPU benchmark and what enterprise companies such as AWS uses to judge a CPU performance. It has .99 correlation.

https://medium.com/silicon-reimagined/performance-delivered-...

Passmark is an outdated benchmark that isn't updated to use ARM instructions.

link