| What causes real-world problems with leap seconds is actually unrelated to the nasty interactions of metrology and solar time -- it's a specific and avoidable problem with how NTP (and many OSes/languages) represent time -- it's a types issue. The right way for computers to represent time is with a number that represents the number of constant-rate ticks that have elapsed past a some agreed-upon epoch. If you know what the epoch is and how long each tick is (lots of people use 1 / 9.192 GHz), it is easy to know how many ticks are between any two time values, and you can convert a time value with one epoch to one with a different epoch and tick rate -- you can do everything people expect to do with time. There are no numbers that represent an invalid time value, and for each moment, there is a unique time value that represents it. There's a one-to-one mapping with no nasty edge cases. Leap seconds are a step function that is added to a constant-rate timescale (whose name is "TAI") in order to generate a discontinuous timescale (whose name is "UTC") that never is too different from solar time. There is nothing fundamentally abhorrent about leap seconds -- there are just good and bad ways to represent, disseminate, and compute with timescales that involve leap seconds. The right way to handle leap seconds can be seen with many GNSSes and PTP (very high precision hardware-assisted time synchronization over Ethernet). GPS, BeiDou, Galileo, and PTP all involve dissemination and computation on time values -- and with dire consequences for failure/downtime/inaccuracy. The designers of those systems all somehow converged on the choice to separate out the nice, predictable, constant-rate and discontinuity-free part of UTC from the nasty step function (the leap second offset). Times in all those systems are represented as the tuple (TAI time at t, leap offset at t). This means that the entire system can calculate and work with (discontinuity-free and constant-rate) TAI times but also truck around the leap offsets so when time values need to be presented to a user (or anything that requires a UTC time), the leap offset can be added then. Crucially, all the maths that are done on time values are done on TAI values, so calculating a time difference or a frequency is easy and the result is always correct, regardless of the leap second state of affairs. Representing UTC time as a tuple makes the semantics of that data type easy to reason about -- the "time" bit is in the first element and is completely harmless -- the edge cases have all live in the second half of the tuple. NTP and Unix (and everything descending and affected by those) have made the mistake of representing and transmitting time as a single integer, TAI(t) + leap offset(t). This is not a data representation that has sensical semantics and it is very hard to reason about it. First of all, the leap second offset is nondeterministic and also unknown -- there is no way to get it from NTP and there is no good way to know the time of the next leap event. Second of all, there are repeated time values for different moments in time (and when a negative leap second will happen, there will be time values that represent no moments in time). Predictably, introducing nondeterministic discontinuities doesn't work so well in the real world. There are a bunch of bugs in NTP software and OS kernels and applications that make themselves shown every time there is a leap second. It's not even just NTP clients that struggle -- 40% of public Stratum-1 NTP servers had erroneous behavior [0] related to the 2015 leap second! Given that level of repeated and widespread failure, the right solution is not to blame programmers -- it should be to blame the standard. The UTC standard and how NTP disseminates UTC are fundamentally not fit for computer timekeeping. GNSS receivers and PTP hardware get used in mission-critical applications (synchronizing power grids and multi-axis industrial processes, timestamping data from test flights and particle accelerators) all the time -- and even worse, there's no way to conveniently schedule downtime/maintenance windows during leap second events! "Leap smear" isn't an acceptable solution for those applications, either -- you can't lie about how long a second is to the Large Hadron Collider. GNSS and PTP systems handle leap second timescales without a hitch by representing UTC time with the right data type -- a tuple that properly separates two values that have the same unit (seconds) but have vastly different semantics. The NTP and unix timestamp approach of directly baking the discontinuities into the time values reliably causes problems and outages. The leap second debacle is not about solar time vs atomic time; it's about the need for data types that accurately represent the semantics of what they describe. [0]: http://crin.eng.uts.edu.au/~darryl/Publications/LeapSecond_c... |