| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by deepblueq 4759 days ago

The problem is that clock doesn't really mean anything concrete in terms of real world performance. It's strictly a marketing thing.

For an example, what if a chip used a 10 GHz clock for distribution, and divided it down to 5 GHz everywhere it was actually used (not that I know of any reason to do such a thing besides marketing). Would it be marketable as a 10 GHz chip? The manufacturer would certainly be in hot water if enthusiasts ever found out...

Even without such contrived scenarios, CPUs get different amounts of stuff done per clock.

Something I keep seeing, even on Slashdot and Hacker News, is the idea that a CPU that has to clock higher for a given performance will use more power. It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock, and power consumption should be orthogonal to clock/IPC ratio.

If anyone's got any contrary ideas on that, I'd love to hear them. All I can think of is that higher clocks would correlate with longer pipelines, but bulldozer's pipeline isn't even that long.

3 comments

VLM 4759 days ago

"is the idea that a CPU that has to clock higher for a given performance will use more power."

This is like a dog whistle to the EEs, they're going to get all riled up by programmers with screwdrivers. You can model a stereotypical FET gate as a capacitor, all you're really doing is charging and discharging capacitors either in FET gates or the transmission line theoretical capacitance. Right out of the C=Q/V definition of what capacitance is, mushed up against some ohms law and some algebra, and you end up with P=C times V squared times F. So you can see the intense excitement in lowering core voltages, making gates and lines smaller (lowering C) all in a tradeoff to improve the P/F or F/P (whatever) ratio.

The important part is its pretty easy, right outta ohms law and the def of what capacitance is, power is directly proportional to frequency.

link

tbrownaw 4759 days ago

The important part is its pretty easy, right outta ohms law and the def of what capacitance is, power is directly proportional to frequency.

There's also the fact that your transistors have a particular voltage that they switch state at, which means that they switch faster if you drive the gate/line capacitance with a higher voltage.

Which means that chips designed for lower frequencies can be designed to use lower voltages, which can save far more power than what would be directly proportional to the lower frequency.

link

VLM 4759 days ago

"which can save far more power"

yes, right out of the equation provided.

In "CS" terms that may be better understood on HN than "EE" terms, electrical power scales O(n squared) with voltage and O(n) with frequency.

If you really wanna get people riled up and talking you can roll out the old power "EE" stuff about maximum power transfer happening when source and sink impedance are the same, and you want to get the most bang for your buck so you'd like that, right, and a transistor gate being near infinite resistance would imply ... Or if you like to think about interconnects being signal to noise level limited, then an RF analysis about noise voltage across a resistor vs preamp noise figure vs current bias from a communications standpoint would imply... But it turns out in practice most of the time, the first mental model is by far the most effective way to look at it compared to these.

link

tbrownaw 4759 days ago

It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock

Suppose CPU A has an adder, that takes one clock cycle to run an add instruction. When two registers are being added, the instruction goes thru the entire adder in one clock cycle and affects on average some % of the transistors.

Suppose CPU B has a pipelined adder that takes two clock cycles to run an add instruction. When two registers are being added, the instruction goes thru half of the adder in one cycle, and the other half in the next cycle, and affects about half of that same % of the transistors each time. BUT! This is a pipelined adder, and doesn't just do one instruction at a time. During the first cycle, when our instruction is in the first part of the adder, some other add instruction is still going thru the second part of the adder and affecting the other half of whatever % of the transistors. And during the second cycle of our instruction, the next instruction is going thru the first half. So even tho any one instruction only affects half of the adder at a time, the entire adder still gets affected every clock cycle.

link

deepblueq 4759 days ago

In that example, CPU B's adder can also be clocked twice as fast. If so, it's getting twice the work done and using twice the power (ignoring cache misses and the like for the moment). If it's clocked the same as A, it's performance and power usage will be almost the same as A.

Roughly speaking, power used = transistors switching per unit time. Performance should also follow that pretty closely, depending on the efficiency of the design. At some level, you should be able to look at any instruction and find a corresponding number of transistors that need to switch for it to execute.

Deep pipelining keeps more silicon active at any given time, increasing both performance and power consumption. Because of cache misses and the like, efficiency will drop somewhat. Double the stages also doesn't quite equal double the switches per time, for various reasons. Therefore, deeper pipelines = worse performance per watt but better performance per dollar (not sure how well that'll hold in ridiculous cases like Prescott).

From what I heard, Bulldozer only has one more stage than Haswell (15 vs. 14, don't quote me on that) - not nearly enough to account for the differences we see between them.

What I'm noting is that there are many, many more factors at play than just pipelining. In the case of Bulldozer, I've been hearing quite a bit about minor parts that they found needed more work, most notably branch prediction. It sounds like they've got lots of things that will improve performance with no power or die size downsides. The number I saw bandied about for Steamroller was a 30% performance increase. I have some trouble believing it's quite that big, but if they pull it off, that will be an amazing chip for being 32nm. It hints to me that the macroscale architecture is A-OK, and they just screwed up some small but important things.

link

wmf 4759 days ago

It seems to me that if you've got double the clock, the likely explanation is that half the transistors are switching per clock, and power consumption should be orthogonal to clock/IPC ratio.

Nope; a lot of the latches are switching every cycle, so power is higher at higher frequency. This is what doomed NetBurst-style design.

link

deepblueq 4758 days ago

Couldn't a 90nm transistor switch at 8 GHz or so in this kind of application? I'm not sure of the exact numbers, but at 1/16th the area occupied, capacitance is much lower, letting it switch far faster.

Just making up some numbers, how about 30% of gates switch on every clock, and 3x the switching speed for modern gates (it's probably much higher, but I'm being conservative here):

NetBurst: (0.3 * 3) / ((0.3 * 3) + (0.7 * 6)) = 17.6% power

Bulldozer: (0.3 * 4.5) / ((0.3 * 4.5) + (0.7 * 18)) = 9.7% power

Sandy Bridge: (0.3 * 3.6) / ((0.3 * 3.6) + (0.7 * 18)) = 7.9% power

So basically, NetBurst is ridiculous, though that shouldn't be news to anyone. Bulldozer doesn't look to be doing so bad as all that, and the numbers improve if the speed is more than 3x.

(I have no idea what the real numbers are, if someone tells me I'll update this.)

link