Hacker News new | ask | show | jobs
by rdtsc 137 days ago
> Notice that in the Skylake Client microarchitecture the RDTSC instruction counts at the machine’s guaranteed P1 frequency independently of the current processor clock (see the INVARIANT TSC property), and therefore, when running in Intel® Turbo-Boost-enabled mode, the delay will remain constant, but the number of instructions that could have been executed will change.

rdtsc may execute out of order, so sometimes an lfence (previously cpuid) can be used and there is also rdtscp

See https://github.com/torvalds/linux/blob/master/arch/x86/inclu...

And just because rdtsc is constant doesn't mean the processor clock will be constant that could be fluctuating.

1 comments

The issue with that is that a load fence may be very detrimental to perf. It doesn't really matter if rdtsc executes out of order in this code anyway, and there is no need for sync between cores.
You could first measure the perf impact of the fence instruction and then subtract that out? But yeah I guess it may not matter much for quick and dirty calibration loop.

I found somewhere (https://aloiskraus.wordpress.com/2018/06/16/why-skylakex-cpu...) that the pause instruction had this wild cycle difference between different CPU and it caused some grief, I had no idea. I stopped doing low level coding a while back.