Hacker News new | ask | show | jobs
by db48x 303 days ago
> The so-called TDP of the Ryzen 9950X is 170W. The used heat sinks are specified to dissipate 165W, so that seems tight.

TDP numbers are completely made up. They don’t correspond to watts of heat, or of anything at all! They’re just a marketing number. You can't use them to choose the right cooling system at all.

https://gamersnexus.net/guides/3525-amd-ryzen-tdp-explained-...

4 comments

When I see the term TDP, I remember what I have read in the "Thermal Design Document" of Intel Core2Quad Q6600 and the family it belongs:

> The thermal solution bundled with the CPUs is not designed to handle the thermal output when all the cores are utilized 100%. For that kind of load, a different thermal solution is strongly recommended (paraphrased).

I never used the stock cooler bundled with the processor, but what kind of dark joke is this?

Most states of “100% utilization” as you’d see in `top` are not 100% thermal output or even close. Cores waiting for memory accesses count as utilized in the former sense but will not produce as much heat as one that is actually using the ALU etc. That’s why special make-work like Prime95 is used for stress testing overclocking/thermals: it will saturate the cores with enough unblocked arithmetic work to generate more heat than having 1000 browser tabs open does.
You're not going to get anywhere near full thermal load with just integer arithmetic either - you need to saturate the floating point units for that.
This is more how I think too: using a cooler that supports your CPU TDP is generally fine because most people will not run a CPU 100% for an extended amount of time. But in this case they seem to be running the CPU 100% for an extended amount of time AND are using an under-spec'ed cooler (even if it is just by 5W).

You don't even need to change the actual cooler since for AMD CPUs you can pretty much customize the TDP whatever way you want, and by default they run well above their efficiency curve. For example, my 7600X has a default TDP of 105W but I run it in Eco Mode (65W) with undervolt and I barely lose any performance. Even if I did no undervolt, running the CPU in Eco Mode is generally preferable since the performance loss is still negligible (~5%).

For a general purpose system, this line of thinking makes sense. However, the desktop system in question was built to be daily driven and support some high performance code research, so it had to endure some serious loads for a desktop computer.

I went the other way and overspecced the CPU cooler and added some silent but high CFM capable fans on the system. The motherboard I got was able to adjust all fans depending on the system temps, so it scaled from a very silent desktop to a low-key space heater automatically under load.

Instead of undervolting the processor, I was using a tweaked on-demand governor on the system which stuck to lower power levels more than usual, so unless I was doing software development and testing things, it stayed cool and silent.

BTW, by 100%, I'm talking about completely saturating the CPU pipeline. Not pseudo 100% where CPU reports saturation but most of the load is iowait.

Man that was a beast of a CPU back in the day.

The Conroe Intel era was amazing for the time.

That was such a fun time to be into hardware. For years Intel had the money and relationships to keep the Pentium 4 everywhere even though AMD had the better product. The P4 might edge ahead in video rendering but the Athlon would win overall and use less power.

Then Conroe launched and the balance shifted. Even the cheapest Core2Duo chips were competitive against the best P4s and the high-end C2Ds rivaled or beat AMD. https://web.archive.org/web/20100909205130/http://www.anandt...

AND those chips overclocked to the moon. I got my E6420 to 3.2ghz (from 2.133ghz) just by upping the multiplier. A quick search makes me think my chip wasn't even that great.

Absolutely. Intel was also keeping up the tick-tock processing. I could be misremembering, but it seemed like every tock intel was getting something like 20% improvements over the last tock. It really wasn't until ~Haswell that that slowed down and continued to slow down to basically nothing. I think Kaby Lake IIRC was the last major performance jump from intel. Everything else has just been incremental changes.
One of the reasons that Intel only shipped 5% incremental updates was AMD was basically non-existent due to both Intel pressuring them and AMD has done a massive mistake with bulldozer/piledriver architecture.

They vastly underestimated how much a single FPU would be bottleneck on a multicore/SMP processor.

Then AMD took things personal and architected Zen/EPYC. The rest is history.

Certainly, and by that time Intel just sort of dropped all the balls. They were already struggling to do die shrinks and it seems like they simply lost all their ability to develop the architecture.

That had maybe happened years earlier. The thing about Conroe is, IIRC, its ancestry came from the P3 and Intel's mobile CPU designs. P4 was steady evolutions on the Netburst architecture. The years of improvements to conroe were mostly just incremental changes and porting over features from Netburst (such as hyperthreading). Once that all played out, intel really didn't have anywhere else to go or plans on how to evolve the architecture. They fell back on the same old "let's just add wider SIMD instructions (AVX)".

I also seem to recall that intel made fab bets that ultimately didn't pay off. Again, IIRC, I believe they were trying to use the same light lithography (230nm light?) rather than going into UV lithography. That caused them to dump a fair bit of money fabrication that never really paid off.

Buying parts for that particular desktop was quite fun:

    - Me: Can I get a Q6600?
    - Seller: But, that's... Quad core?
    - Me: Yes, I'll have it.
    - Seller: OK. RAM?
    - Me: I'll get OCZ Flex-XLC Hybrids. 1GB.
    - Seller: *Gives one*
    - Me: I'll get four.
    - Seller: ?
    - Me: Yes, four please.
Motherboard was an MSI P35 Platinum. Fun times.
I always used the stock cooler, because it's quiet and nothing uses the cpu to its fullest :).
You are correct. In fact these guys measured a maximum socket power consumption of 240 watt using a 9950X at stock settings, running prime95. So far above the "170 watt" TDP:

https://hwbusters.com/cpu/amd-ryzen-9-9950x-cpu-review-perfo...

I don’t understand this argument. If the CPU dissipated an equal number of watts of heat energy as it consumed from the wall, there wouldn’t be any energy left to do actual useful work. Isn’t the extra 100W accounted for by things like changing the state of flip-flops? In other words, mustn’t one consider the entropy reduction of the system as an energy sink?
Clocking and changing register states requires charging and discharging the gate capacitance of a bunch of MOSFET transistors. The current that results from moving all that charge around encounters resistance, which converts it to heat. Silicon is only a "semi" conductor after all.

You are correct that there is energy bound in the information stored in the chip. But last I checked, our most efficient chips (e.g., using reversible computing to avoid wasting that energy) are still orders of magnitude less efficient than those theoretical limits.

Thank you for encouraging me to go on this educational adventure. I have now heard of Landauer’s principle, which says each bit of information releases 2.9e-21 joules when destroyed: https://en.wikipedia.org/wiki/Landauer%27s_principle
I think the numbers are more like <1W used in actual information processing, >239W lost to heat. Information and the transformation of it does have some inherent energy cost. But it is very, very small. And you end up getting that back as heat somewhere else down the line anyways.
Nope. Remember that you cannot destroy energy. The energy you use to flip the flip flop still exists, only now it’s just disordered waste heat instead of electricity.
Energy cannot be created or destroyed, but it can enter and leave an open system. When I lift a 10kg box 1 meter in the air, I don’t raise its temperature at all, and I only raise mine a tiny bit, yet I have still done work on the box and therefore have imparted it energy. The energy came from food I ate earlier, and was ultimately stored in the box as gravipotential energy.

Is this not analogous to storing energy in the EM fields within the CPU?

CPUs don't store nontrivial amounts of energy, and even if storing a 1 was a significantly higher energy level than a 0 (or vice versa) there's no plausible workload that would be causing the CPU to switch significantly more 0s to 1s than 1s to 0s (or vice versa).
Yes, but only briefly. When you study the thermodynamics of information you’ll discover that it’s actually erasing information that has a cost. Every time the CPU stores a value in a register it erases the previous value, using up energy. In fact, every individual transistor has to erase the previous state on basically every clock cycle.

Curiously there is a minimum cost to erase a single bit that no system can go below. It’s extremely small, billions of times smaller than the amount of energy our CPUs use every time they erase a bit, but it exists. Look up Landauer’s Limit. There is a similar limit on the maximum amount of information stored in a system which is proportional to the surface area of the sphere that the information fits inside. Exceed that limit and you’ll form a black hole. We’re no where near that limit yet either.

>In fact, every individual transistor has to erase the previous state on basically every clock cycle.

This is incorrect in both directions.

Only transistors whose inputs are changing have to discharge their capacitance.

This means that if the inputs don't change nothing happens, but if the inputs change then the changes propagate through the circuit to the next flip flop, possibly creating a cascade of changes.

Consider this pathological scenario: The first input changes, then a delay happens, then the second input changes so that the output remains the same. This is known as a "glitch". Even though the output hasn't changed, the downstream transistors see their input switch twice. Glitches propagate through transistors and not only that, if another unfortunate timing event happens, you can end up with accumulating multiple glitches. A single transistor may switch multiple times in a clock cycle.

Switching transistors costs energy, which means you end up with "parasitic" power consumption that doesn't contribute to the calculated output.

To be a bit flippant, you can absolutely destroy energy by creating some mass..

Then again most of us do not have particle accelerator nearby looking for Higgs boson.

>> The energy you use to flip the flip flop

> To be a bit flippant

I see what you did here :)

I’m sorry, but no. Mass is just energy.
What happens to the energy that did the useful work?
I have a 65W TDP CPU, and the difference in power draw (measured at the outlet) from idle to full CPU load is over 100W; it seems to just raise the clock until it hist 95C, so if I limit the CPU fan's top speed, the power draw goes down.
Yep. Modern CPUs continually adjust their clock multiplier based on what their temperature is doing, plus a few timers. If you have a better cooler then you’ll get more performance out of the same CPU, but at the cost of drawing more power and producing more heat.
Wow, I can't believe how BS this TDP is! I feel like a total idiot! I've always assumed it's sorta-kinda a tight upper bound on power consumption, perhaps with some allowance for "imperfections" in the dissipation properties of the CPU, and that I shouldn't sweat the details.

Couldn't this count as false/misleading advertizing though?

It's thermal design power, ie. it's the power that it's designed for, not absolute max.
No, they don’t design the chip with these numbers in mind. The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.
That seems a little too cynical. It matters how a customer might use a chip, such as the type of cooling that would be expected in a typical system using that model, and that's informed by the advertised specifications. Base clocks and the amount of SRAM also figure into TDP. No doubt there are completely arbitrary aspects to TDP driven purely by profit-focused market segmentation, but it's not just that.

That said, it's definitely very frustrating as someone who does the occasional server build. Not only does TDP not reflect minimum or maximum power draw for a CPU package itself, but it's also completely divorced from power draw for the chipset(s), NICs, BMCs (ugh), etc, not to mention how the vendor BIOS/firmware throttles everything, and so TDP can be wildly different from power draw at the outlet. The past 5 years have kind of sucked for homelab builders. The Xeon E3 years were probably peak CPU and full-system power efficiency when accounting for long idle times. Can you get there with modern AMD and Intel chips? Maybe. Depends on who you ask and when. Even with identical CPUs, differences in motherboard vendor, BIOS settings, and even kernel can result in drastically different (as in 2-3x) reported idle power draw.

No, clock speed and cache have nothing to do with TDP. AMD uses a simple formula to calculate TDP. It is the temperature of the IHS minus the air temperature measured at the cpu cooler’s intake fan, divided by a conversion faction in °C/W.

But they don’t use real temperatures from real systems. They just make up a different set of temperatures for each CPU that they sell, so that the TDP comes out to the number that they want. The formula doesn’t even mean anything, in real physical terms.

I agree that predicting power usage is far more difficult than it should be. The real power usage of the CPU is dependent on the temperature too, since the colder you can make the CPU the more power it will voluntarily use (it just raises the clock multiplier until it measures the temperature of the CPU rising without leveling off). And as you said there are a bunch of other factors as well.

> The formula doesn’t even mean anything, in real physical terms.

From your description the formula is how you would calculate the power for which a certain heatsink at a given ambient temperature would result in the specified IHS temperature.

The °C/W number is not a conversion factor but the thermal resistance[1] of the heatsink & paste, that is a physical property.

So unless I misunderstood you it's very much something real in physical terms.

[1]: https://fscdn.rohm.com/en/products/databook/applinote/common...

>The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.

Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?

Yep. The 5800X is a higher bin specifically because it can clock higher than the ones in the 5700X bin. That certainly makes them draw more power, so they give them a higher TDP number too. But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.
>But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.

I don't get it, are you referring to the phenomenon that different workloads have different power consumption (eg. a bunch of AVX512 floating point operations vs a bunch of NOPs), therefore TDP is totally made up? I agree that there's a lot of factors that impact power usage, and CPUs aren't like a space heater where if you let it run at full blast it'll always consume the TDP specified, but that doesn't mean TDP numbers are made up. They still vaguely approximate power usage under some synthetic test conditions, or at the very least is vaguely correlated to some limit of the CPU (eg. PPT limit on AMD platforms).

Huh, I always thought it was “total dissipated power”. Like you’d use to spec a power supply.
Its pretty insane to see someone say something like: “TDP is about thermal watts, not electrical watts. These are not the same.” Watts are watts.

But yeah, TDP means nothing. If you stick plenty of cooling and run the right motherboard board revision your "TDP" can be whatever you want it to be until the thing melts.

"TDP is about average watts, not peak watts" would be an honest way of saying it.
But in the end that's still not actually true in many modern desktop chips. You can take a 65W part, and with a "stock" motherboard firmware, good cooling, and the right workload end up averaging way more than 65W. Or if you have it in a hot room it just might end up using less than 65W.

TDP is more of a rough idea of how much power the manufacturer wanted to classify the part as. It ultimately only loosely relates to the actual heat or electrical usage in practice.

> Couldn't this count as false/misleading advertizing though?

For what, exactly? TDP stands for "thermal design power" - nothing in that means peak power or most power. It stopped being meaningful when CPUs learned to vary clock speeds and turbo boost - what is the thermal design target at that point, exactly? Sustained power virus load?

> For what, exactly? TDP stands for "thermal design power"

The chip is not designed for this rate of power dissipation; and it is not the rate of power dissipation that you can expect to get from the chip.

> The chip is not designed for this rate of power dissipation

Says who? AMD advertises the chip as having a base clock of 4.3 GHz over all cores. The 9950X pulls somewhere around 220W at 5ghz all cores and with how power scales, 170W at the advertised 4.3 GHz seems more than plausible. Seems perfectly within reason that the advertised frequency and the advertised TDP are aligned.

I wish Anandtech was still around as iirc they did have charts for all this, which nobody else seems to do :/

> and it is not the rate of power dissipation that you can expect to get from the chip.

Again, says who? Who's expectations? This is a consumer chip, and the expectations of a consumer chip is not that it spends 100% of its time running prime95 or a similar "power virus" workload. I expect that if I buy this chip while I would have intervals of >170W, I'd also have long periods of much less than 170W. If I have a cooler designed to sustain 170W of cooling, that's going to work out on average just fine as there's thermal mass in the system.

> Says who?

Says AMD and says Intel, apparently. At the link, there is an official explanation (sort of) how the TDP figure is derived.