Hacker News new | ask | show | jobs
by datenwolf 1063 days ago
In other news: Water is wet.

The high resolution precision time counters are derived from the system base clock, usually operating at ~33MHz, which translates exactly into that 30ns granularity observed.

If you really want robust time derived timestamp identifiers, truncate the high resolution timer to at best 10µs resolution replace the low bits with the hash of them, concatenated with the value of the CPU cycle counter (`RDTSC` on x86).

7 comments

The HRT patch set for Linux (which is eight years old) claims that Linux had a 10μs clock resolution before the patch, and conversations on SO suggest the resolution is now 1ns, so I believe this information is dated, or at least OS dependent.
I don’t think this is right. I am pretty doubtful of the discussion of the granularity of the results of clock_gettime, but I failed to find any sources and I don’t have a suitable computer to hand to experiment.

But here are two more doubts:

1. On at least some systems, clock_gettime will be implemented by a vdso call that (on x86) uses rdtsc to give you the time, so you should expect its results to be pretty highly correlated with an rdtsc immediately afterwards, so you aren’t really adding useful entropy (whereas you are losing time-locality of your identifiers which may be desirable if they are to end up as primary keys somewhere)

2. On some systems (eg some AMD processors), rdtsc gave me pretty low precision results (eg rounding to the nearest 10 cycles) so you also won’t necessarily get the entropy you would expect from the description of the instruction. I failed to find a reference for this too apart from an offhand mention in a paper about timing attacks.

> replace the low bits with the hash of them

Doesn't this hash-step only increase the probability of collisions?

What is this step intended to solve?

> replace the low bits with the hash of them, concatenated with the value of the CPU cycle counter (`RDTSC` on x86)

you're concatenating two values and then taking the hash of the combination, ie:

hash(low bits + CPU cycle counter)

But a hash() destroys information.

When you are trying to reduce collisions, why would you use a function that is known to introduce collisions?

Because the raw value from the timestamp will be very low entropy and have the short scale variation concentrated in just a few bits. A hash not just destroys information, it also creates entropy by mixing that information over all the bits that come out of it. And using 64 bit hash replacing a 64 bit nanosecond counter that has short term entropy of less than 16 bits, you're in fact reducing the likelihood of a collision by a factor of 2^48.
The hash, in this case, is just one deterministic way to shorten the CPU counter to few bits which can then be used to increase the entropy of the timestamp by replacing the timestamps stale bits. What's being asked here is not why use some compressed bits of the CPU counter increases the entropy of the timestamp overall. Rather, why you'd use a hash of the entropy providing information to do this since hashes allow for many:1 mappings (i.e. allow for decreases in entropy) while never providing better than 1:1 mappings (i.e. preserving entropy). At best, the hash is preserving the entropy of the timestamp and counter bits and, at worst, it is throwing some away. The question that follows: is there a better way that preserves the entropy of the inputs without the risk of throwing some away and, if so, was there some other reason to still use the hash? This is what I took amelius to be asking.

Knowing the timestamp delta is ~30ns then even a 1 THz processor would only execute 30,000 cycles before said timestamp increases and the outcome stays unique anyways. From that perspective, taking the low 16 bits of the cycle counter is already guaranteed to help produce perfectly unique timestamps, and without the risk of introducing hash collisions. It's also significantly easier to compute and now monotonic* now, whereas hashes were neither. From that perspective, I too don't see what value the hash is supposed to add to the information being fed to it in this case.

In threaded/distributed/parallel scenarios you may wish to throw the lane/core/node number in the bits as well. In the case having the same timestamp is supposed to be invalid this leaves you a simple deterministic way to tiebreak the situation as part of the creation of the timestamp itself.

*A 10 GHz CPU running for a little over 58 years could risk a 64 bit counter rollover in the middle of a time delta. If that's too close for comfort or you want the code to work on e.g. 32 bit counter systems, you'd need to eat more cycles and registers to set another bit to the outcome of whether current cycle count is < the one at the start of the current timestamp delta.

Yeah, when I was thinking about the hash, I thought of it as stuffing to fill the unused portion of the number that would look better than using zeroes.

TIMESTAMP010111101010101TSC "looks better" than TIMESTAMP000000000000000TSC even though they contain the same information.

I would drop the hash, it's deceptive.

I don't believe it breaks the monotonicity, though? I mean, it would if it weren't already broken. If you're taking the low 16 bits of the TSC, then a rollover in those 16 bits during the same timestamp will already go backwards. TIMESTAMP0...0 follows TIMESTAMPf...f.

> it also creates entropy

It's a nitpick, but it concentrates the entropy. It doesn't create any.

I do think it will make the answer more clear, as the hash concentrates the less than 64 bits of entropy on that 128 bits of original data into a usable 64 bits package.

Actually hashes do create entropy (every computation creates entropy in some form or another). What's the entropy of a 4 bit number? What's the entropy of a 4 bit number hashed by a 64 bit hash function? The act of computation does in fact create entropy, as per the 2nd law of thermodynamics, a part of which shows up in the hash.
Why?

    low bits + CPU cycle counter
is enough. No need of the hash()
You don't know precisely at which frequency the cycle counter runs. Depending on the system load it might either run faster or slower than the lowest bits the HPTC. For what it's worth this part is more or less nondeterministic, so the sane thing to do, is spread out the information as much as possible (maximize entropy), in order to minimize the probability of collisions.
That's ok, the bits past the low bits are just there to avoid collisions, not an actual measure of high precision time beyond the low bits.

It's not worse than the hash solution, I'm just saying it's not necessary to hash it if the only objective is to reduce collisions.

In fact the hashing solution, if it is replacing the low bits with a hash of low bits plus something else, is actually destroying valuable time information.

That only works, if you know exactly, that the low bits are constant. Otherwise you may run into the issue that due to unsteady rate of RDTSC between two processes/threads that may be preemptively unscheduled between reading the HPTC and the RDTSC you might again end up with colliding time stamps. And if you took the derivative between timestamps taken in succession, you might even find is to be non-monotonic in places.

The combination of multiple counters incremented by individual unsteady clocks used to be a source for pseudo random scrambler sequences; these days we prefer LFSRs, but overall this is something that can be weird.

Hence my recommendation: Just throw xxHash32 on concatenation of the HPTC's low bits and the CPU clock cycle counter, and forgo any pretense of monotony in the low bits (because very likely you don't have it anyway).

I thought clock_gettime() usually does use rdtsc(p) on Linux? Possibly depending on the particular clock type (montonic, realtime, etc). Either way I'd be interested in knowing more.
RDTSC is directly influenced by frequency scaling. So while monotonic, its clock interval is neither constant, nor deterministic on modern systems.

Here's a small online visualization of RDTSC average and standard deviation I just hacked together: https://gist.github.com/datenwolf/151486f6d73c9b25ac701bdbde...

On a system with frequency scaling you can see that under higher load the difference between RDTSC in subsequent iterations of a tight loop that does nothing else than reading that register will drop. Here's how it looks on the system I'm currently using: https://www.youtube.com/watch?v=FKKjSJ1JZ78

> RDTSC is directly influenced by frequency scaling

Unfortunate wording, RDTSC itself is not influenced by frequency scaling, it has constant frequency on modern CPUs after all. Your video nicely shows that RDTSC delta is influenced by CPU frequency, as expected, but how does it affect using RDTSC as a clock? On my CPU RDTSC seems to tick at 3GHz, for example. I wonder how precise it is though, how much its frequency can drift from spec.

Ah crud, you're right. What'd really be interesting to see is the ratio between d/dt rdtsc and d/dt clock_monotonic.
I did update my program, now it measures the ratio.
Wouldn't those operations reduce the accuracy of the time stamp?
What is this "accuracy" you're talking about? Between the scheduler hanging over a process's head, ready to yank away its compute time for milliseconds between the syscall to read out the HPTC or the CPU cycle counter, and the process actually writing the timestamp into a variable/in-memory struct, reading the HPTC from the underlying hardware also not being super accurate, and the CPU cycle counter being influence by frequency scaling, on a bog-standard machine the highest precision you can reasonable expect is on the order of 100µs or so.
Modern CPUs don't really give you accurate nanosecond-scale time stamps anyways. The CPU will randomly speed up or slow down, execute instructions out of order, and even speculatively execute instructions. Not to mention that it'll have a dozen different clocks - which are not guaranteed to be in sync.
This may be coming from a place of ignorance, but if that were the case, then time would drift significantly constantly due to Intel speed step for example. And if that were the source of truth for time, then when your computer is off, it wouldn’t be able to keep. I’m pretty sure they all have real time clock chips in the motherboards.
There's time keeping (which uses the real time clock) and high resolution clocks, which are good for relative timers. They're never the same components.
> The CPU will randomly speed up or slow down

constant_tsc has been a thing for more than a decade

> execute instructions out of order, and even speculatively execute

rdtscp serialises

> Not to mention that it'll have a dozen different clocks - which are not guaranteed to be in sync.

synchronised_tsc has been a think for about 6 years now

None of these are performant, no?

Generally, you can have consistency, speed, or low cost - but not more than two at the same time.

Invariant (Constant) TSC is detectable via `cpuid` and applies to `rdtsc/rdtscp` by default. In that aspect, there's no tradeoff being made there (observable to software) AFAICK.
interesting results! i guess it depends in if you consider 35ish cycles expensive or not.

[https://community.intel.com/t5/Software-Tuning-Performance/H...]

Integrating a hash might improve (not guarantee) the uniqueness situation but not the monotonicity situation. Right?
Well, you can always attempt to catch the CPU cycle counter overflow (happens at roughly 10Hz on current machines), adding up the carries and add it to a nanosecond counter shifted up by a few bits. Problem with the CPU cycle counter is, that it's not in lockstep with the HPTC, due to dynamic frequency scaling.

If you really, really, really need system wide, nanosecond precision timestamps, you'll have to resort to dedicated hardware incrementing a counter at the desired rate, and with a well known latency for each read access. On the hardware and driver level you'd have some MMIO port mapped into user space with an identity transform between bus address and process address space.

However this still doesn't solve the problem, of the scheduler being able to throw in several milliseconds between reading the high precision timer value into a register and writing it back into the memory holding the timestamp variable. Seriously, on your typical computer system software derived timestamps at more than 100µs resolution are kind of bogus.

Some of this is not current.

Constant-rate TSCs are ~15-20 years old. Synchronized TSCs are at least ten.

Also RDTSC produces a 64-bit result, so it does not overflow at a rate of several Hz.

It's 64 bit on 64 systems. In the world of hard realtime applications there's still a huge armada of systems out there in the field running on 32 bit (think motion control, CNC machines, industrial robotics). If you're developing software that's concerned with this kind of precision, you might find yourself confronted with such "outdated" systems more often, than not.

Also see https://news.ycombinator.com/item?id=36814762

Or UUID7