Hacker News new | ask | show | jobs
by rachelbythebay 2043 days ago
A negative leap second (the appalling shitshow from the article) would make us skip a second, though.

Going back one second is what generally happens now when a Linux box does the "inserting leap second" thing. It goes from 23:59:59.999999 to 23:59:59.000000, then runs that whole second again. You get to 23:59:59.999999 again, and then you finally roll over to 00:00:00.000000.

From the perspective of the typical time_t rendering of Unix time, there is no way to uniquely represent that 61st second. It just "disappears".

Having lived through systems going backwards some 17 seconds due to a botched NTP-GPS appliance config, I can tell you that what died at that particular site was all of the locking code that used wall time clocks instead of monotonic clocks. They all CHECKed and died when their preconditions were no longer valid.

This didn't have to happen. They could have used monotonic from the get-go, and then they wouldn't have died when the clocks got yanked backwards 17 seconds to the proper time base.

2 comments

>'From the perspective of the typical time_t rendering of Unix time, there is no way to uniquely represent that 61st second. It just "disappears".'

This is a great and really intuitive summary of the problem. Is there a certain class of problem related to disappearing second? Like are these more likely to be filesystem issues or things that rely on timestamps? Or are there second order problems as well?

>"I can tell you that what died at that particular site was all of the locking code that used wall time clocks instead of monotonic clocks. They all CHECKed and died when their preconditions were no longer valid. I can tell you that what died at that particular site was all of the locking code that used wall time clocks instead of monotonic clocks. They all CHECKed and died when their preconditions were no longer valid."

Sorry if this is a silly question but was that check simply that "time t1 is greater than time t0"? Also was the duration of that outage(17 seconds) or would this have been equally catastrophic at a single second?

Well, okay, so, the problem is like this: let's say you wanted to schedule something to happen during that exact extra second in the end of June 2015 when we were all standing around watching UTC do its little extra dance. You pick 1435708800. Trouble is, that Unix time applies to both 23:59:59Z and 23:59:60Z on that particular day.

You can't target it beforehand or after. It's just... gone.

It's not a problem from the point of view of programs, since they just got whatever time_t value they got, and they don't know the bigger perspective. It's more of a mapping from outside->in problem.

Put it another way: try writing a program that'll call clock_gettime() and will say a message at a later time you select. You can't put in 23:59:60Z because there's no way to represent it, and indeed, you won't even be able to tell when the time comes unless you special-case it and notice _that particular second_ repeating itself... or reach into the kernel to look at the leap bit, or worse. It's a real time in meatspace, but you can't target it with the tools at hand. That's the problem.

Regarding the 17 second thing, that's because someone decided to switch off the thing which (correctly) applies the adjustment factor to GPS time to make it NTP time. There was a 17 second difference at the time (GPS to NTP), and with it off, we were shipping GPS time to hosts as if it was NTP time.

In theory, any regression of the clock long enough to not let the actual passing of time push it past the sanity check time point in the lock stuff would have caused this. The thing is, a small-scale time step (from ntpd, say) normally happens at boot up, not later, and it's on a system by system basis.

The 17 second excursion happened on hundreds of thousands of machines all at once, and, yeah, it was noticed.

Great explanations. Thanks!
> Having lived through systems going backwards some 17 seconds due to a botched NTP-GPS appliance config

Well then, it really was botched. I was running an off-line hodge podge of 50 or so RH7.3 + Win2K systems back in 2002, where I had to manually fix the few second drift every couple of months, and the RH machines all slewed faster/slower to adapt, non ever went backward or jumped forward (Don't remember how the Win2K handled it; They weren't mission critical and I didn't care much).