Hacker News new | ask | show | jobs
by axaxs 4657 days ago
Clock speed is an unfortunate marketing gimmick anymore. I dare compare it to peak horsepower. A 3ghz chip from today will run circles around a chip from 2002, and with less power to boot. AMD is ahead in the clock speed race, but gets beat handily by "slower" Intel processors, while using twice the power. The focus going forward is going to be on power efficiency and using more cores, not clock speed.
4 comments

This is true, but it's missing the point. A modern CPU gets probably 50% more work out of a median clock cycle and runs 33% faster for single threaded (turbo) workloads. So it's twice as fast. And sure, there are four of them on the die.

But back up another decade to 1992, where a top of the line PC was a 50MHz 486 with well under half the IPC of the linked Northwood running 60x slower.

For those of us who remember the 80's and 90's, it's a very different world we live in.

The (single threaded) performance improvement is significantly larger. Anandtech has 2005 vintage Pentium in their benchmarks: http://anandtech.com/bench/product/92?vs=836 and there is probably a significant perf difference between 2002 and 2005.

And you can't really ignore the massive improvements gained via GPUs. There are your 100x differences

First off, you're absolutely right. The difference in IPC between a modern CPU and a 2002 vintage Pentium 4 is pretty incredible, and the way forward is all about power efficiency and cramming more cores onto one die.

However, I think it's unfair to say clock speed is an unfortunate marketing gimmick "anymore" when it was a gimmick all the way back in 2002 when Intel released the 3.06 GHz "Northwood" Pentium 4 that the OP's linked article references. In fact, it was a gimmick that caused one of the biggest strategy/roadmap blunders Intel ever made.

Intel designed NetBurst (the architecture that the P4 was based on) to do one thing really well: allow Intel to ramp up clock speeds quickly. The architectural choices they made to enable this severely hobbled the P4's performance, especially the 20 (later 31!) stage pipeline that made the penalty for branch mispredictions pretty awful.

Intel eventually released P4s that clocked as high as 3.8 GHz and had an unheard of (in the x86 space) 115 watt TDP, but when the Athlon 64 was released, AMD could smoke Intel's fastest P4s using slower clocked CPUs with lower TDPs. Instead of focusing on raw clock speed, AMD focused on architectural improvements like x86-64, HyperTransport, and an integrated memory controller. (Intel CPUs wouldn't see QPI or an integrated memory controller until Nehalem, released five years after the first Athlon 64.)

As you say, the tables are now turned — clock for clock, the IPC of AMD's Piledriver core is behind that of even Intel's (two generations old) Sandy Bridge core, and all AMD seems to be able to do is add more cores and crank up the clock speed. Unfortunately for AMD, adding more cores doesn't help single-threaded performance, and a very nasty side-effect of increasing clock speed is that the processor's TDP increases disproportionately: the 4.7 GHz FX-9590 has a whopping 220 watt TDP, while the 4.0 GHz FX-8350's TDP is 125 watts.

Actually, the performance difference is small. The biggest gain we have today is multi-cores, so no one thread can hog all processing pipelines. If you're pushing through 1 billion instructions, it will still take 1/3 of a second for your CPU to chunk through all of it, but other code can be processed simultaneously on another core.

I know, I know. My laptop (Lenovo T400) has almost the exact same specs as the old Sunfire V20 rackmounts I have (2x2.4ghz cores, 4gb ram, passable video), but the laptop can run on batteries for 5 hours.

But clockspeed is still king -- this T400 run circles around a Lenovo W520 with a 1.6ghz Core i7 and 3x the ram. I know because I had a W520 for work, and I could see the difference.

If you have more cores, and a single task then you have to deal with partitioning of a dataset for a task - if it is partitionable. Thats where the problem lies with SMP computing. Branch predictors have gotten whole lots smarter just as compilers better at giving code that fits better into multiple pipelines.