Hacker News new | ask | show | jobs
by cataphract 2305 days ago
I don't think his comment about CLOCK_MONOTONIC_RAW being slow to query applies anymore. It used to be slow because it was not implemented in the vDSO library and so it included the overhead of a syscall. But there was a big vDSO refactoring that landed on 5.3 that I think fixed this problem.

Edit: found the patchset. In includes benchmarks for several architectures as well: https://lore.kernel.org/linux-arm-kernel/20190621095252.3230...

3 comments

This is great news!

clock_monotonic greatly increases the failure surface of intra- (and inter-) machine timings than clock_monotonic_raw. A misconfigured ntp can cause bad slew in clock_monotonic. For clock_monotonic_raw, the main source of failures should be the oscillator controlling your CPU. If that happens, you have bigger problems.

> The generic implementation includes the arch specific one and lives in "lib/vdso".

Is this the shared object that gets mapped to the address space of each process?

Yes.
It's a fun fact that on cloud VMs (AWS, etc) vDSO gettime doesn't exist, so if you rely on vDSO to make time measurement free, it's not.
Maybe this is true for AWS VMs that use Xen. I believe that Linux VMs on Azure do not have this problem, since they use the Hyper-V reference time page, which can be queried from the vDSO.
I believe newer generation AWS VMs, like C5, use kvm clock source now and not xen. On the older ones switching to tsc speeds up things.
About a year ago I was writing a personal profiling framework in C++. To test it I profiled how long it took to get the time and if it was scalable (multiple threads making the same call don't interfere) ran it on Windows, and a Linux guest on a Windows host, and an AWS instance. Your post finally explains why the AWS graphs were all over the place!
Any idea why this is? I'm even more curious since you've singled out "cloud VMs" from all VMs.
My understanding is one of the reasons that the virtual time stuff in clouds doesn't work in a straightforward way is that your VM can migrate to another host, where reading TSC could give the appearance of discontinuous time, including jumping backwards in time which would be very bad. This is not a problem unless your VMs are migratory, which I guess is something I associate with GCE. And clouds seem to me to have more variety of hardware than I'd expect in my own private infrastructure, and that variety comes with a great many TSC quirks.
Intel's VMCS includes a TSC offset field as well as TSC scaling. These allow for a stable RDTSC across migrations between hosts, modulo actual time lost to migration blackout.

(I work on virtualization in GCE)

Cool! I looked up more info on this and ended up here, http://www.brendangregg.com/blog/2017-05-04/the-pmcs-of-ec2....

But I only skimmed this and I don't see that it mentions about migration across hosts. Only that the hypervisor is able to expose the MSRs or PMCs.

So do gettimeofday and the various clock_gettime methods [1] hit the vDSO on GCP, or do they incur a syscall, or something else?

---

[1] Not all of the clock_gettime sources hit the vDSO even on bare metal Linux on typical x86 hardware, but many of the important ones do.

Wouldn't preventing access to high-resolution clocks also be a mitigation for speculative side channels?