| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by datenwolf 1063 days ago

In other news: Water is wet.

The high resolution precision time counters are derived from the system base clock, usually operating at ~33MHz, which translates exactly into that 30ns granularity observed.

If you really want robust time derived timestamp identifiers, truncate the high resolution timer to at best 10µs resolution replace the low bits with the hash of them, concatenated with the value of the CPU cycle counter (`RDTSC` on x86).

7 comments

hinkley 1063 days ago

The HRT patch set for Linux (which is eight years old) claims that Linux had a 10μs clock resolution before the patch, and conversations on SO suggest the resolution is now 1ns, so I believe this information is dated, or at least OS dependent.

dan-robertson 1063 days ago

I don’t think this is right. I am pretty doubtful of the discussion of the granularity of the results of clock_gettime, but I failed to find any sources and I don’t have a suitable computer to hand to experiment.

But here are two more doubts:

1. On at least some systems, clock_gettime will be implemented by a vdso call that (on x86) uses rdtsc to give you the time, so you should expect its results to be pretty highly correlated with an rdtsc immediately afterwards, so you aren’t really adding useful entropy (whereas you are losing time-locality of your identifiers which may be desirable if they are to end up as primary keys somewhere)

2. On some systems (eg some AMD processors), rdtsc gave me pretty low precision results (eg rounding to the nearest 10 cycles) so you also won’t necessarily get the entropy you would expect from the description of the instruction. I failed to find a reference for this too apart from an offhand mention in a paper about timing attacks.

amelius 1063 days ago

> replace the low bits with the hash of them

Doesn't this hash-step only increase the probability of collisions?

What is this step intended to solve?

jjk166 1063 days ago

> replace the low bits with the hash of them, concatenated with the value of the CPU cycle counter (`RDTSC` on x86)

you're concatenating two values and then taking the hash of the combination, ie:

hash(low bits + CPU cycle counter)

amelius 1063 days ago

But a hash() destroys information.

When you are trying to reduce collisions, why would you use a function that is known to introduce collisions?

datenwolf 1063 days ago

Because the raw value from the timestamp will be very low entropy and have the short scale variation concentrated in just a few bits. A hash not just destroys information, it also creates entropy by mixing that information over all the bits that come out of it. And using 64 bit hash replacing a 64 bit nanosecond counter that has short term entropy of less than 16 bits, you're in fact reducing the likelihood of a collision by a factor of 2^48.

zamadatix 1063 days ago

The hash, in this case, is just one deterministic way to shorten the CPU counter to few bits which can then be used to increase the entropy of the timestamp by replacing the timestamps stale bits. What's being asked here is not why use some compressed bits of the CPU counter increases the entropy of the timestamp overall. Rather, why you'd use a hash of the entropy providing information to do this since hashes allow for many:1 mappings (i.e. allow for decreases in entropy) while never providing better than 1:1 mappings (i.e. preserving entropy). At best, the hash is preserving the entropy of the timestamp and counter bits and, at worst, it is throwing some away. The question that follows: is there a better way that preserves the entropy of the inputs without the risk of throwing some away and, if so, was there some other reason to still use the hash? This is what I took amelius to be asking.

Knowing the timestamp delta is ~30ns then even a 1 THz processor would only execute 30,000 cycles before said timestamp increases and the outcome stays unique anyways. From that perspective, taking the low 16 bits of the cycle counter is already guaranteed to help produce perfectly unique timestamps, and without the risk of introducing hash collisions. It's also significantly easier to compute and now monotonic* now, whereas hashes were neither. From that perspective, I too don't see what value the hash is supposed to add to the information being fed to it in this case.

In threaded/distributed/parallel scenarios you may wish to throw the lane/core/node number in the bits as well. In the case having the same timestamp is supposed to be invalid this leaves you a simple deterministic way to tiebreak the situation as part of the creation of the timestamp itself.

*A 10 GHz CPU running for a little over 58 years could risk a 64 bit counter rollover in the middle of a time delta. If that's too close for comfort or you want the code to work on e.g. 32 bit counter systems, you'd need to eat more cycles and registers to set another bit to the outcome of whether current cycle count is < the one at the start of the current timestamp delta.

sfink 1063 days ago

Yeah, when I was thinking about the hash, I thought of it as stuffing to fill the unused portion of the number that would look better than using zeroes.

TIMESTAMP010111101010101TSC "looks better" than TIMESTAMP000000000000000TSC even though they contain the same information.

I would drop the hash, it's deceptive.

I don't believe it breaks the monotonicity, though? I mean, it would if it weren't already broken. If you're taking the low 16 bits of the TSC, then a rollover in those 16 bits during the same timestamp will already go backwards. TIMESTAMP0...0 follows TIMESTAMPf...f.

marcosdumay 1063 days ago

> it also creates entropy

It's a nitpick, but it concentrates the entropy. It doesn't create any.

I do think it will make the answer more clear, as the hash concentrates the less than 64 bits of entropy on that 128 bits of original data into a usable 64 bits package.

datenwolf 1063 days ago

Actually hashes do create entropy (every computation creates entropy in some form or another). What's the entropy of a 4 bit number? What's the entropy of a 4 bit number hashed by a 64 bit hash function? The act of computation does in fact create entropy, as per the 2nd law of thermodynamics, a part of which shows up in the hash.

dheera 1063 days ago

Why?

    low bits + CPU cycle counter

is enough. No need of the hash()

datenwolf 1063 days ago

You don't know precisely at which frequency the cycle counter runs. Depending on the system load it might either run faster or slower than the lowest bits the HPTC. For what it's worth this part is more or less nondeterministic, so the sane thing to do, is spread out the information as much as possible (maximize entropy), in order to minimize the probability of collisions.

dheera 1063 days ago

That's ok, the bits past the low bits are just there to avoid collisions, not an actual measure of high precision time beyond the low bits.

It's not worse than the hash solution, I'm just saying it's not necessary to hash it if the only objective is to reduce collisions.

In fact the hashing solution, if it is replacing the low bits with a hash of low bits plus something else, is actually destroying valuable time information.

datenwolf 1063 days ago

That only works, if you know exactly, that the low bits are constant. Otherwise you may run into the issue that due to unsteady rate of RDTSC between two processes/threads that may be preemptively unscheduled between reading the HPTC and the RDTSC you might again end up with colliding time stamps. And if you took the derivative between timestamps taken in succession, you might even find is to be non-monotonic in places.

The combination of multiple counters incremented by individual unsteady clocks used to be a source for pseudo random scrambler sequences; these days we prefer LFSRs, but overall this is something that can be weird.

Hence my recommendation: Just throw xxHash32 on concatenation of the HPTC's low bits and the CPU clock cycle counter, and forgo any pretense of monotony in the low bits (because very likely you don't have it anyway).

usefulcat 1063 days ago

I thought clock_gettime() usually does use rdtsc(p) on Linux? Possibly depending on the particular clock type (montonic, realtime, etc). Either way I'd be interested in knowing more.

datenwolf 1063 days ago

RDTSC is directly influenced by frequency scaling. So while monotonic, its clock interval is neither constant, nor deterministic on modern systems.

Here's a small online visualization of RDTSC average and standard deviation I just hacked together: https://gist.github.com/datenwolf/151486f6d73c9b25ac701bdbde...

On a system with frequency scaling you can see that under higher load the difference between RDTSC in subsequent iterations of a tight loop that does nothing else than reading that register will drop. Here's how it looks on the system I'm currently using: https://www.youtube.com/watch?v=FKKjSJ1JZ78

vient 1063 days ago

> RDTSC is directly influenced by frequency scaling

Unfortunate wording, RDTSC itself is not influenced by frequency scaling, it has constant frequency on modern CPUs after all. Your video nicely shows that RDTSC delta is influenced by CPU frequency, as expected, but how does it affect using RDTSC as a clock? On my CPU RDTSC seems to tick at 3GHz, for example. I wonder how precise it is though, how much its frequency can drift from spec.

datenwolf 1063 days ago

Ah crud, you're right. What'd really be interesting to see is the ratio between d/dt rdtsc and d/dt clock_monotonic.

datenwolf 1063 days ago

I did update my program, now it measures the ratio.

byteknight 1063 days ago

Wouldn't those operations reduce the accuracy of the time stamp?

datenwolf 1063 days ago

What is this "accuracy" you're talking about? Between the scheduler hanging over a process's head, ready to yank away its compute time for milliseconds between the syscall to read out the HPTC or the CPU cycle counter, and the process actually writing the timestamp into a variable/in-memory struct, reading the HPTC from the underlying hardware also not being super accurate, and the CPU cycle counter being influence by frequency scaling, on a bog-standard machine the highest precision you can reasonable expect is on the order of 100µs or so.

crote 1063 days ago

Modern CPUs don't really give you accurate nanosecond-scale time stamps anyways. The CPU will randomly speed up or slow down, execute instructions out of order, and even speculatively execute instructions. Not to mention that it'll have a dozen different clocks - which are not guaranteed to be in sync.

byteknight 1063 days ago

This may be coming from a place of ignorance, but if that were the case, then time would drift significantly constantly due to Intel speed step for example. And if that were the source of truth for time, then when your computer is off, it wouldn’t be able to keep. I’m pretty sure they all have real time clock chips in the motherboards.

nomel 1063 days ago

There's time keeping (which uses the real time clock) and high resolution clocks, which are good for relative timers. They're never the same components.

blibble 1063 days ago

> The CPU will randomly speed up or slow down

constant_tsc has been a thing for more than a decade

> execute instructions out of order, and even speculatively execute

rdtscp serialises

> Not to mention that it'll have a dozen different clocks - which are not guaranteed to be in sync.

synchronised_tsc has been a think for about 6 years now

lazide 1063 days ago

None of these are performant, no?

Generally, you can have consistency, speed, or low cost - but not more than two at the same time.

kprotty 1063 days ago

Invariant (Constant) TSC is detectable via `cpuid` and applies to `rdtsc/rdtscp` by default. In that aspect, there's no tradeoff being made there (observable to software) AFAICK.

lazide 1063 days ago

interesting results! i guess it depends in if you consider 35ish cycles expensive or not.

[https://community.intel.com/t5/Software-Tuning-Performance/H...]

imglorp 1063 days ago

Integrating a hash might improve (not guarantee) the uniqueness situation but not the monotonicity situation. Right?

datenwolf 1063 days ago

Well, you can always attempt to catch the CPU cycle counter overflow (happens at roughly 10Hz on current machines), adding up the carries and add it to a nanosecond counter shifted up by a few bits. Problem with the CPU cycle counter is, that it's not in lockstep with the HPTC, due to dynamic frequency scaling.

If you really, really, really need system wide, nanosecond precision timestamps, you'll have to resort to dedicated hardware incrementing a counter at the desired rate, and with a well known latency for each read access. On the hardware and driver level you'd have some MMIO port mapped into user space with an identity transform between bus address and process address space.

However this still doesn't solve the problem, of the scheduler being able to throw in several milliseconds between reading the high precision timer value into a register and writing it back into the memory holding the timestamp variable. Seriously, on your typical computer system software derived timestamps at more than 100µs resolution are kind of bogus.

jsolson 1063 days ago

Some of this is not current.

Constant-rate TSCs are ~15-20 years old. Synchronized TSCs are at least ten.

Also RDTSC produces a 64-bit result, so it does not overflow at a rate of several Hz.

datenwolf 1063 days ago

It's 64 bit on 64 systems. In the world of hard realtime applications there's still a huge armada of systems out there in the field running on 32 bit (think motion control, CNC machines, industrial robotics). If you're developing software that's concerned with this kind of precision, you might find yourself confronted with such "outdated" systems more often, than not.

Also see https://news.ycombinator.com/item?id=36814762

BiteCode_dev 1063 days ago

Or UUID7