Hacker News new | ask | show | jobs
by j_seigh 169 days ago
Ok,so people use NTP to "synchronize" their clocks and then write applications that assume the clocks are in exact sync and can use timestamps for synchronization, even though NTP can see the clocks aren't always in sync. Do I have that right?
3 comments

If you are an engineer at Google dealing with Spanner, then you can in fact assume clocks are well synchronized and can use timestamps for synchronization. If you get commit timestamps from Spanner you can compare them to determine exactly which commit happened first. That’s a stronger guarantee than the typical Serializable database like postgresql: https://www.postgresql.org/docs/current/transaction-iso.html...

That’s the radical developer simplicity promised by TrueTime mentioned in the article.

That’s actually not at all what TrueTime guarantees and assuming they’ve solved a physical impossibility is dangerous technically as a founding assumption for higher level tech (which thankfully Spanner does not do).

What TrueTime says is that clocks are synchronized within some delta just like NTP, but that delta is significantly smaller thanks to GPS time sync. That enables applications to have tighter bounds on waiting to see if a conflict may exist before committing which is why Spanner is fast. CockroachDB works similarly but given the logistical challenge of getting GPS receivers into data centers, they worked to achieve a smaller delta through better NTP-like timestamps and generally get fairly close performance.

https://programmingappliedai.substack.com/p/what-is-true-tim...

> Bounded Uncertainty: TrueTime provides a time interval, [earliest, latest], rather than a single timestamp. This interval represents the possible range of the current time with bounded uncertainty. The uncertainty is caused by clock drift, synchronization delays, and other factors in distributed systems.

That’s exactly what I’m saying but you simply provided more details. TrueTime guarantees clocks are well synchronized: and of course that means synchronized to a reasonable upper bound. It’s no more possible for clocks to be absolutely synchronized, than for two line segments drawn independently to have absolutely the same length.
> you can compare them to determine exactly which commit happened first

This is the part I was referring to. You cannot just compare timestamps and know which happened first. You have to actually handle the case where you don’t know if there’s a happens before relationship between the timestamps. Thats a very important distinction

I quote from Spanner docs at https://docs.cloud.google.com/spanner/docs/true-time-externa...

> External consistency states that Spanner executes transactions in a manner that is indistinguishable from a system in which the transactions are executed serially, and furthermore, that the serial order is consistent with the order in which transactions can be observed to commit. Because the timestamps generated for transactions correspond to the serial order, if any client sees a transaction T2 start to commit after another transaction T1 finishes, the system will assign a timestamp to T2 that is higher than T1's timestamp.

Of course there is always the edge case where two commits have the same commit timestamp. Therefore from the perspective of Spanner, they happen simultaneously and there is no way to determine which happens first. But there is no need to. There is no causality relationship between them. If you insist, you can arbitrarily assign a happens-before relationship in your own code and nothing will break.

Alternatively, you could guarantee the same synchronization using PPS and PTP to each host's DCD pin of their serial port or to specialized hardware such as modern PTP-enabled smart NICs/FPGAs that can accept PPS input. GPS+PPS gets you to within 20-80ns global synchronization depending on implementation (assuming you're all mostly in the same inertial frame), and allows you to make much stronger guarantees than TrueTime (due to higher precision distributed ordering guarantees, which translate to lower latency and higher throughput distributed writes).
Of course, you can do this in good conditions. The extremely powerful part that TrueTime brings is how the system degrades when something goes wrong.

If everyone is synced to +/- 20ns, that's great. Then when someone flies over your datacenter with an GPS jammer (purposeful or accidental), this needs to not be a bad day where suddenly database transactions happen out of order, or you have an outage.

The other benefit of building in this uncertainty to the underlying software design is you don't have to have your entire infrastructure on the same hardware stack. If you have one datacenter that's 20yrs old, has no GPS infrastructure, and operates purely on NTP - this can still run the same software, just much more slowly. You might even keep some of this around for testing - and now you have ongoing data showing what will happen to your distributed system if GPS were to go away in a chunk of the world for some sustained period of time.

And in a brighter future, if we're able to synchronize everyone's clocks to +/- 1ns, the intervals just get smaller and we see improved performance without having to rethink the entire design.

> Then when someone flies over your datacenter with an GPS jammer (purposeful or accidental), this needs to not be a bad day where suddenly database transactions happen out of order, or you have an outage.

Most NTP/PTP appliances have internal clocks that are OCXO or rubidium that have holdover (even for several days).

If time is that important to you then you'll have them, plus perhaps some fibre connections to other sites that are hopefully out of range of the jamming.

> fibre connections to other sites that are hopefully out of range of the jamming.

I guess it's not inconceivable that eventually there's a global clock network using a White-Rabbit-like protocol over dedicated fibre. But if you have to worry about GPS jamming you probably have to worry about undersea cable cutting too.

> But if you have to worry about GPS jamming you probably have to worry about undersea cable cutting too.

GPS jamming can be done by a random truck driver:

* https://www.cnet.com/culture/truck-driver-has-gps-jammer-acc...

Cutting cables at the bottom of the sea is an entire different class of attack (anchors notwithstanding).

Good thing cesium fountains are very accurate then...
In summary, with different business requirements you would build a different technical solution.
> and allows you to make much stronger guarantees than TrueTime (due to higher precision distributed ordering guarantees, which translate to lower latency and higher throughput distributed writes).

TrueTime is the software algorithm for managing the timestamps. It’s agnostic to the accuracy of the underlying time source. If it was inaccurate then you get looser bounds and as you note higher latency. Google already does everything you suggest for TrueTime while also having atomic clocks in places.

Yup! I was referring to the original TrueTime/Spanner papers, not whatever's currently deployed. The original paper makes reference to distributed ordering guarantees at the milliseconds' scale precision, which implies many more transactions in flight in the uncertain state and coarser distributed ordering guarantees than the much tighter upper bound you can set with nanoseconds' precision and microseconds' comms latency...
More than a decade of progress, probably in no small part from Google pushing vendors to improve hardware :)
Amen. :)
Truetime is based on GPS and local atomic clocks. Google's latest timemasters are even better, around 10ns average.
Isn't that because Google has its own atomic clocks, rather than NTP which is (generally) using publicly available atomic clocks?
More that they use GPS to synchronize the clocks. Having your own atomic clock doesn’t really improve your accuracy except for within the single data center you have it deployed (although I’m sure there’s techniques for synchronizing with low bounds against nearby atomic clocks + GPS to get really tight bound so they don’t need one in every data center)
Depending on the application you would generally use PTP to get sub-microsecond accuracy. The real trick is that architecture should tolerate various clocks starting or jumping out of sync and self correct.
*misuse timestamps for synchronization