Hacker News new | ask | show | jobs
by arp242 747 days ago
> I would stick to SPEC and Geekbench.

I will repeat:

"On Geekbench I gave up after scrolling a few pages."

SPEC doesn't seem to have easily browsable results, but we can find the Cinebench 2024 ones easy and guess what? Apple isn't at the top. Not even close: https://www.cgdirector.com/cinebench-2024-scores/

2 comments

Geekbench has a seperate page for each "instruction set".

For Apple you need to go to https://browser.geekbench.com/mac-benchmarks

Then compare numbers by hand I assume.

Though what I would love is compile-time vs. $ (as mentioned, I'm a software developer). The 7950x is $500 and a very fast SSD is $400, fast 64gb is $200, very good board is $400 so I get a very fast dev machine for ~$1700.

I compiled a few previously. Sorry for the formatting:

ASUS ROG Zephyrus G16 (2024)

Processor: Intel Core Ultra 9 185

Memory: 32GB

Cargo Build: 31.85 seconds

Cargo Build --Release: 1 minute 4 seconds

ASUS ROG Zephyrus G14 (2024)

Processor: AMD Ryzen 8945HS / Radeon 780M

Memory: 32GB

Cargo Build: 29.48 seconds

Cargo Build --Release: 34.78 seconds

ASUS ROG Strix Scar 18 (2024)

Processor: Intel Core i9 14900HX

Memory: 64GB

Cargo Build: 21.27 seconds

Cargo Build --Release: 28.69 seconds

Apple MacBook Pro (M3 Pro 11 core)

Processor: M3 Pro 11 core

Cargo Build: 13.70 seconds

Cargo Build --Release: 21.65 seconds

Apple MacBook Pro 16 (M3 Max)

Processor: M3 Max

Cargo Build: 12.70 seconds

Cargo Build --Release: 15.90 seconds

Firefox Mobile build:

M1 Air: 95 seconds AMD 5900hx: 138 seconds Source: https://youtu.be/QSPFx9R99-o?si=oG_nuV4oiMxjv4F-&t=505

Javascript builds

Here, Alex compares the M1 Air running Parallels emulating Linux vs native Linux on AMD Zen2 mobile. The M1 is still significantly faster. https://youtu.be/tgS1P5bP7dA?si=Xz2JQmgoYp3IQGCX&t=183

Docker builds

Here, Alex runs Docker ARM64 vs AMD x86 images and the M1 Air built the image 2x faster than an AMD Zen2 mobile. https://youtu.be/sWav0WuNMNs?si=IgxeMoJqpQaZv2nc&t=366

Anyways, Alex has a ton more videos on coding performance between Apple, Intel and AMD.

Lastly, this is not M1 vs Zen2 but it's M2 vs Zen4.

LLVM build test

M2 Max: 377 seconds Ryzen 9 7940S: 826 seconds

@aurareturn really appreciate the comparison and your effort (upvoted). As I no longer use a laptop (don't need it, too expensive, breaks, no upgrades, but that is just me), browsing through that channel looks like he focuses on laptops.

Would love to see a 7950x/64gb/SSD5 comparison, perhaps (see https://www.octobench.com/ for SSD impact on Go compilation) he will create one in the future (channel bookmarked). But would I still need to use a laptop, I would probably switch back to Apple (have an iMac Pro as decoration standing in the shelf, was my last Apple dev machine).

The $5000 16 Pro looks great as a machine. When still working at eBay, the nice thing was one always got the max specced machine as a developer back in the days - so that would probably be it. Real nice one.

[Edit]

Someone suggested looking at Geekbench Clang, which brought some insights for my desktop usage:

(it looks like top CPUs are more or less the same, ~15% difference)

"Randomly" picking

    M2 Ultra   233.9 Klines/sec
    7950x      230.3 Klines/sec  
    14900K     215.3 Klines/sec  
    M3 Max     196.5 Klines/sec
Ah right; that's confusing! Seems that the AMD and Intel chips are much faster though, consistent with other benchmarks.

Speed vs. $ is of course a different story than pure speed; kinda hard to capture in a number I guess.

Most Go projects compile more than fast enough even on my 7 year old i5, although there are exceptions (mostly crummy hyper-overengineered projects).

I think it depends on what you do. If you write a database in Go, there is no problem with 5min compile time. If you write a web app, 10 sec compile times are already annoying.
5 minute compile time is annoying no matter what project you are trying to compile. Thankfully most every golang package I compile, takes less than 2-3 seconds.
"go test" compiles your package. Would not want a 5 minute feedback loop!
Here's GB6: https://browser.geekbench.com/v6/cpu/compare/6339005?baselin...

Note: M3 Max is a 40w CPU maximum, while 7950x is a 230w CPU maximum. The stated 170w max is usually deceptive from AMD.

Source for 7950x power consumption: https://www.anandtech.com/show/17641/lighter-touch-cpu-power....

Note that the M3 Max leads in ST in Cinebench 2024 and 2-3x better in perf/watt. It does lose in MT in Cinebench 2024 but wins in GB6 MT.

Cinebench is usually x86 favored as it favors AVX over NEON as well as having extremely long dependency chains, bottlenecked by caches and partly memory. This is why you get a huge SMT yield from it and why it scales very highly if you throw lots of "weak" cores at it.

This is why Cinebench is a poor CPU benchmark in general as the vast majority of applications do not behave like Cinebench.

Geekbench and SPEC are more predictive of CPU speed.

It the end, what matters is real-world performance and different workloads have different bottlenecks. For people who use Cinema 4D, Cinebench is the most accurate measurement of hardware capabilities they can get. It's very hard to generalize what will matter for the vast majority of people. I find it's best to look at benchmarks for the same applications or similar workloads to what you'll be doing. Single score benchmark like Geekbench are fun and quick way to get some general idea about CPU capabilities, but most of the time they don't match specifics of real-world workloads.

Here's a content creation benchmark (note that for some tasks a GPU is also used):

https://www.pugetsystems.com/labs/articles/mac-vs-pc-for-con...

Cinebench is being used like a general purpose CPU benchmark when, like you said, it should only be used to judge the performance of Cinema 4D. Cinema 4D is a niche software in a niche. Why is a niche in a niche software application being used to judge overall CPU performance? It doesn't make sense.

Meanwhile, Geekbench does run real world workloads using real world libraries. You can look at subtest scores to to see "real world" results.

Pugetsystem benchmarks are pretty good. It shows how Apple SoCs punch above their weight in real world applications over benchmarks.

Regardless, they are comparing desktop machines using as much as 1500 watts vs a laptop that maxes out at 80 watts and Apple is still competing well. The wins in the PC world are usually due to beefy Nvidia GPUs that are applications have historically optimized for.

That's why I originally said ARM is leading AMD - specifically Apple ARM chips.

Some Geekbench 6 workloads and libraries:

- Dijkstra's algorithm: not used by vast majority of applications.

- Google Gumbo: unmaintained since 2016.

- litehtml: not used by any major browser.

- Clang: common on HN, but niche for general population.

- 3D texture encoding: very niche.

- Ray Tracer: a custom ray tracer using Intel Embree lib. That's worse than Cinebench.

- Structure from Motion: generates 3D geometry from multiple 2D images.

It also uses some more commonly used libraries, but there's enough niche stuff in Geekbench that I can't say it's a good representation of a real world workloads.

> Regardless, they are comparing desktop machines using as much as 1500 watts vs a laptop that maxes out at 80 watts and Apple is still competing well. The wins in the PC world are usually due to beefy Nvidia GPUs that are applications have historically optimized for.

They included a laptop, which is also competing rather well with Apple offerings. And it's not PC's fault you can't add a custom GPU to Apple offerings.

Cinebench R23 was built on Intel Embree. ;)

You can cherry pick libraries that Geekbench 6 uses that are old or niche but do they do a good job as proxies?

The point of a general CPU benchmark is to predict CPU performance. For that, Geekbench does an outstanding job. [0]

[0]https://medium.com/silicon-reimagined/performance-delivered-...

> Cinebench R23 was built on Intel Embree. ;)

Yes, and the renderer is the same as in Cinema 4D which is used by many, while custom ray tracer build for Geekbench is not used outside benchmarking.

The blog post you linked just shows that one synthetic benchmark correlates with another synthetic benchmark. Where do SPEC CPU2006/2017 benchmarks guarantee or specify correlation with real-world performance workloads?

Cinebench fits characteristics of a good benchmark defined by SPEC [1]. It's not a general benchmark, but biased benchmark. It's floating-point intensive. It should do a perfect job of evaluating Cinema 4D rendering performance, a good job as proxy for other floating-point heavy workloads, but a poor job as proxy for other non-floating-point heavy workloads. The thing is, most real world workloads are biased. So, no single-score benchmark result can do an outstanding job of predicting CPU performance for everyone (which is what a general CPU benchmark with a single final score aims to do). They are useful, but limited in their prediction abilities.

[1] https://www.spec.org/cpu2017/Docs/overview.html#Q2