i think you could prove that on linux by getting the number of instructions run. I think perf has a way of getting that info for you. not sure if windows has something simmilar
Hmm, different workloads will have different mixes of instructions, which will take very different amounts of time per instruction (e.g. a chain of uncached memory lookups vs a tight loop of register-register arithmetic). If you had two instances of the same workload, and one was throttled, then yes, you could compare instructions per unit time. But comparing prime95 vs the other benchmark is likely misleading.
I see. But I expect that some instructions cost more (i.e. generate more heat) than others. Someone mentions AVX instructions, for example; presumably 1 million AVX-512 fused-multiply-add instructions cost a lot more than 1 million loads (we can probably arrange for the right proportion of loads to hit a CPU cache vs. go to main memory, so that they end up taking the same time). Even without AVX, I imagine things like 64-bit integer divisions or CRC32 instructions cost more than loads or stores, though I don't know by how much.