Hacker News new | ask | show | jobs
by codyps 3395 days ago
For those actually curious about the implementation on solaris/illumos, heres a quick rundown (from looking at current illumos source):

- comm_page (usr/src/uts/i86pc/ml/comm_page.s) is literally a page in kernel memory with specific variables that is mapped (usr/src/uts/intel/ia32/os/comm_page_util.c) as user|read-only (to be passed to userspace, kernel mapping is normal data, AFAICT)

- the mapped comm_page is inserted into the aux vector at AT_SUN_COMMPAGE (usr/src/uts/common/exec/elf/elf.c)

- libc scans auxv for this entry, and stashes the pointer it containts (usr/src/lib/libc/port/threads/thr.c)

- When clock_gettime is called, it looks at the values in the COMMPAGE (structure is in usr/src/uts/i86pc/sys/comm_page.h, probing in usr/src/lib/commpage/common/cp_main.c) to determine if TSC can be used.

- If TSC is usable, libc uses the information there (a bunch of values) to use tsc to read time (monotonic or realtime)

Variables within comm_page are treated like normal variables and used/updated within the kernel's internal timekeeping.

Essentially, rather than having the kernel provide an entry point & have the kernel know what the (in the linux case) internal data structures look like, here libc provides the code and reads the exported data structure from the kernel.

So it isn't reading the time from this memory page, it's using TSC. In the case of CLOCK_REALTIME, corrections that are applied to TSC are read from this memory page (comm_page).

1 comments

So it isn't reading the time from this memory page, it's using TSC. In the case of CLOCK_REALTIME, corrections that are applied to TSC are read from this memory page (comm_page).

This summary only applies to Illumos. The Solaris implementation diverged significantly around build 167 (2011) long after the last OpenSolaris build Illumos was based on (build 147). It changed again significantly in 2015.

I believe Circonus contributed an alternate implementation that does some of the same things as Solaris in 2016:

https://www.circonus.com/2016/09/time-but-faster/

With that said, you are correct that whether or not it will read from a memory page instead depends on which interfaces you are using (e.g. get_hrusec()) and other subtle details.

So the only things I'm seeing in the linked circonus code that differ from illumos:

1. no use of a kernel supplied page, determines skew/etc itself in userspace 2. stores information on a per-cpu level, and tries to execute cpuid on the same cpu as rdtsc.

I'm presuming you're talking about #2 (and #1 is just due to the linked item being a library without kernel integrations)? Perhaps with some more kernel support so that the actual cpu rdtsc ran on can be reliably determined?

This still doesn't clarify the part about "shared page in which the time is updated" and is read from. This statement appears to imply TSC is not (necessarily) used (otherwise I'd categorize it under "uses values from memory page to fixup TSC", like Illumos' current implimentation). I'm still not sure how that can be done reasonably.

Is there just a 1 micro second timer running whenever a user task is being executed that is bumping the value? Wouldn't that be quite a bit of overhead? Or some HW trick? I mean, you could generate a fault on every read, and have the kernel populate the current data, but that seems just as bad as a syscall.