No, they don’t design the chip with these numbers in mind. The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.
That seems a little too cynical. It matters how a customer might use a chip, such as the type of cooling that would be expected in a typical system using that model, and that's informed by the advertised specifications. Base clocks and the amount of SRAM also figure into TDP. No doubt there are completely arbitrary aspects to TDP driven purely by profit-focused market segmentation, but it's not just that.
That said, it's definitely very frustrating as someone who does the occasional server build. Not only does TDP not reflect minimum or maximum power draw for a CPU package itself, but it's also completely divorced from power draw for the chipset(s), NICs, BMCs (ugh), etc, not to mention how the vendor BIOS/firmware throttles everything, and so TDP can be wildly different from power draw at the outlet. The past 5 years have kind of sucked for homelab builders. The Xeon E3 years were probably peak CPU and full-system power efficiency when accounting for long idle times. Can you get there with modern AMD and Intel chips? Maybe. Depends on who you ask and when. Even with identical CPUs, differences in motherboard vendor, BIOS settings, and even kernel can result in drastically different (as in 2-3x) reported idle power draw.
No, clock speed and cache have nothing to do with TDP. AMD uses a simple formula to calculate TDP. It is the temperature of the IHS minus the air temperature measured at the cpu cooler’s intake fan, divided by a conversion faction in °C/W.
But they don’t use real temperatures from real systems. They just make up a different set of temperatures for each CPU that they sell, so that the TDP comes out to the number that they want. The formula doesn’t even mean anything, in real physical terms.
I agree that predicting power usage is far more difficult than it should be. The real power usage of the CPU is dependent on the temperature too, since the colder you can make the CPU the more power it will voluntarily use (it just raises the clock multiplier until it measures the temperature of the CPU rising without leveling off). And as you said there are a bunch of other factors as well.
> The formula doesn’t even mean anything, in real physical terms.
From your description the formula is how you would calculate the power for which a certain heatsink at a given ambient temperature would result in the specified IHS temperature.
The °C/W number is not a conversion factor but the thermal resistance[1] of the heatsink & paste, that is a physical property.
So unless I misunderstood you it's very much something real in physical terms.
It might be a useful formula _if_ the numbers were real. Note that when AMD tells you that a 9900X cpu is has a 120W TDP, that's because they picked three numbers to plug into that formula that result in 120 popping out. They picked the result of 120 first, and then found numbers to put into the formula so that it gives you that result.
But the reason I say that it’s physically meaningless is that real heat dissipation is strongly temperature dependent. The thermal conductivity of a heatsink goes up as the temperature goes up because heat is more effectively transferred into the air at higher temperatures.
>The marketing department picks the number they want based on how they want customers to think about the chip, and which competitors they want you to compare it against. They just plug in whatever numbers are needed into the formula so that the number comes out how they want it.
Are you just describing product segmentation? ie. how the ryzen 5700x and 5800x are basically the same chip, down to the number of enabled cores, except for clocks and power limit ("TDP")?
Yep. The 5800X is a higher bin specifically because it can clock higher than the ones in the 5700X bin. That certainly makes them draw more power, so they give them a higher TDP number too. But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.
>But the TDP doesn’t have anything to do with how much power the cpu will draw or how much heat it will generate in practice. Those numbers vary quite a lot; the CPU continuously adjusts it’s own frequency multiplier based on it’s own measured temperature, meaning it’ll draw more power if you cool it better.
I don't get it, are you referring to the phenomenon that different workloads have different power consumption (eg. a bunch of AVX512 floating point operations vs a bunch of NOPs), therefore TDP is totally made up? I agree that there's a lot of factors that impact power usage, and CPUs aren't like a space heater where if you let it run at full blast it'll always consume the TDP specified, but that doesn't mean TDP numbers are made up. They still vaguely approximate power usage under some synthetic test conditions, or at the very least is vaguely correlated to some limit of the CPU (eg. PPT limit on AMD platforms).
No, the TDP number doesn’t even vaguely approximate anything. You can’t use the number to predict anything, or to plan, or to estimate your electric bill, or anything like that.
Isn't TDP supposed to be an upper bound of how much power budget there is for a chip when it's running under maximum IPC (which implies AVX512 workload spread across all cores with all the test data in all the L1 caches)? I guess that power budget can vary due to process imperfections and/or CPU bugs but saying that it doesn't approximate anything is hard to believe. How about the PSU then, e.g. is 800W PSU a made-up number as well?