Hacker News new | ask | show | jobs
by brandmeyer 3485 days ago
> Instead of adding a single extra second to the end of the day, we'll run the clocks 0.0014% slower across the ten hours before and ten hours after the leap second, and “smear” the extra second across these twenty hours.

Holy leaping second, batman! Unilaterally being off by up to a half second from the rest of the world's clocks is a pretty aggressive step. I think I would have preferred to see a resolution made by an independent body on something this drastic.

4 comments

You're going to have a bad time if you assume "the rest of the world" isn't doing their own, different adjustment

https://developers.google.com/time/smear#othersmears

> preferred to see a resolution made by an independent body

Independent bodies have spent the last decade debating if leap seconds should even exist. Agreeing on how to treat them if we keep them is way down the priority list.

> You're going to have a bad time...

Hilarious. Was this intentional?

Smearing leap seconds does make sense, but it's an odd step to take unilaterally, rather than coordinating with other NTP servers and with Linux timekeeping (which currently handles leap seconds via a 61-second minute instead).
I think taking this step unilaterally is the only way it's going to be taken. Given that Google doesn't want to deal with leap seconds [1], and that the standards organizations have been debating removing leap seconds for years, at least they're publicizing what they're doing.

[1] for good reasons

It isn't even so simple as "work with others" - the other people involved here aren't even thinking at this level. Leap seconds are an ugly hack that were inserted without any thought as to the impact on computer systems. We should not use them, they exist as a vanity project imo.

Remember when leap seconds caused ALL JVMs to lock up until restarted? Or kernel bugs? ick!

> Leap seconds are an ugly hack that were inserted without any thought as to the impact on computer systems.

No. Leap seconds were a rationalization of a prior system that really was ugly for computers. The conversion between TAI and UT(n) that was used before UTC involved table-driven algorithms with multiple rules and microsecond adjustments.

If you thought that six months' notice to add a leap second to /etc/leapsecs.dat is a huge imposition, then you you should try creating a computer system that can cope with rules like "for the next three months, from January the 1st to March the 31st 1964, you must add 0.001296 of a second for each day since 38761 and then add a further 3.240130 seconds".

Ironically, UTC and the leap second system are geared towards the same sort of timekeeping that computers do and away from the civil timekeeping that preceded it: a constant length second that can be measured with oscillators and electronic counters, being the basis for civil time; rather than astronomical calculation.

In effect leap time has been used since the middle on the first millennium BC, the Babylonians discovered the difference between mean solar time and sidereal time and had corrected clocks ever since then. In order to reconcile relative earth surface time with earth mean solar time meant that time had to be inserted or removed somewhere.

In the spirit of DevOps small frequent changes are better than big infrequent changes we have leap seconds instead of leap minutes, hours, days etc. In this way noon is still when the sun is at it's highest (+/- 0.5 relative earth surface seconds).

Leap minutes could be publicized a century before being implemented, making sure all libraries accounted for the, just like leap hours.
Which then means, if you have a library that can support leap hours, why not leap seconds? More frequent small changes is much better than infrequent large changes; at the very least it will disabuse programmers of poor understanding of time and how to track it properly.
>Linux timekeeping (which currently handles leap seconds via a 61-second minute instead)

Google doesn't think so: "No commonly used operating system is able to handle a minute with 61 seconds"

Bits of pieces of the operating systems might handle leap seconds properly, but it's doubtful that every single component that uses time does the right thing. The last two leap seconds have revealed bugs in the kernel: https://lwn.net/Articles/504744/ for the one in 2015 and https://lwn.net/Articles/648313/ for the one in 2016, and I don't think it's unlikely that the one scheduled for December will reveal another.
Some things did go wrong on a few of the previous leap-second injections, and the Linux timekeeping maintainer had talked about changing the approach to handling them (which has already changed at least once in the past).

I don't, however, think it makes sense to unilaterally change this, without (any obvious signs of) coordination with the timekeeping maintainers and the maintainers of major NTP servers.

Things go wrong on every leap second, sometimes catastrophically. They go wrong on non-leap-seconds because of falsely advertised leap seconds. They go wrong 4 months before a leap second because a leap indicator got set and some software had an incorrect idea of when it was due.

Never mind the theory, the practice is a clusterfuck.

One of the big problems is application support. How many will break by seeing 60 as current second as opposed to 59 twice?
I am sure tons of applications that use gettimeofday() to keep track of time can break in subtle ways when seeing 59 twice. Of course, they're broken considering that there is clock_gettime(), however this is a POSIX interface that is not really monotonic too by default, and the monotonic versions of it are Linux-only implementations.
> I am sure tons of applications that use gettimeofday() to keep track of time can break in subtle ways when seeing 59 twice.

gettimeofday doesn't return hour/minute/second divisions; it just returns seconds/microseconds since the epoch. Functions like strftime and gmtime handle the components of time. And leap seconds don't make applications see 59 twice; they make them see 60 once (58, 59, 60, 0, 1, ...).

Quoting the manpages for gmtime and strftime:

> tm_sec The number of seconds after the minute, normally in the range 0 to 59, but can be up to 60 to allow for leap seconds.

> %S The second as a decimal number (range 00 to 60). (The range is up to 60 to allow for occasional leap seconds.) (Calculated from tm_sec.)

Break them, fix them, and move on. Must we coddle to every programmer's incompetence?
I'd assume its because, at Google scale, you can dictate what "time" is considered internally.

> All Google services, including all APIs, will be synchronized on smeared time, as described above. You’ll also get smeared time for virtual machines on Compute Engine if you follow our recommended settings.

This seems to be the Google way sometimes. "We're going to take a standard and change it and do things our way. Toodles!" Just like what they did with IMAP & Gmail.
It's far better than what POSIX clocks do. They'll just drop back a second and you'll get the same time twice.
That is a historical artifact. The original Unix developers decided to treat time as seconds since the start of 1970, implicitly assuming that every day has 86400 seconds. Back then UTC was in its infancy, most programmers had not even heard of leap seconds, and most computer clocks were set by the sysadmin looking at his watch. If we were starting from scratch we would have a date-time type with a day number field and a seconds-since-midnight field. However that would be a breaking change for every piece of software out there, so we are stuck with a time_t that cannot handle leap seconds.
Or we would use the NTP timestamp directly (or a lower precision version) as time_t, which (AFAIK) doesn't suffer from leap seconds. On can always convert time_t to truct tm, tm_sec is defined to be in the range of [0..60].
They've been doing this since at least 2011.
That doesn't make it any less unilateral. IMO, it makes it rather worse to have been doing this for five years. The situation would be much better if they had been working to build a broader consensus over all that time. As near as I can tell, they don't even have consensus within Linux, let alone POSIX or the ITU.
In the absense of a published and agreed standard, every approach is unilateral. My company is taking a similar approach - disconnecting from external NTP servers on 31st December, stepping the change in gradually, and reconnecting when we're "right" again.

Google have never been in the NTP business - there's no reason for them to have worked towards a concensus on this. But when a company their size makes their approach publicly available to all, it starts to pave the way for a consistent standard for everyone.

Google is in the NTP business. Chromebooks sync time from their servers, android devices can (but also from carrier provide time signals), and for hosts in their cloud services.
Honest question: what's so bad about others' systems running on slightly off time? I get why people care about internal consistency, and why deviations should be quite small, but this?
During the last leap second, I had servers configured against google's semi-public servers and some other good sources of time. ntpd marked the google servers as a false ticker sometime during the distortion, and when it was done, was happy with it again. However, I have more non-google servers than google servers, and high minpoll times which tends to result in time checks between servers happening far apart in time, so even if I had multiple google servers, they wouldn't look very close together.
Slightly contrived example: Lets say that you were running a distributed database, and you had distributed instances across different cloud providers for increased reliability. if your database relies on high-resolution timestamps for distributed conflict resolution, then you're going to have a hard time.

Another example: Suppose that a portion of an industrial monitoring system processes remote sensor data in a cloud datacenter with smeared time, while the sensor nodes keep strict UTC time. Your SCADA system had better not have any hard-baked assumptions like "messages cannot come from the future", or you're going to have a hard time, too.

Lets say that a company's internal NTC servers include several sources for reliability and redundancy. Much like Google DNS, perhaps one of the sources is Google NTP, while another is derived from the NTP pool. How do you expect the NTP daemon to behave in this situation? It will certainly be able to observe a 500ms difference between its source timeservers.

Both of those examples strike me as very contrived.

I can't think of anyone who cares that much about timekeeping who isn't running their own internal NTP infrastructure.

Google's Spanner requires accurate global time, so they deployed GPS and atomic clocks. Same for CDMA. There are some applications for high-resolution time (eg finance), so protocols like PTP exist.

A smeared NTP source in an otherwise normal list of time sources doesn't seem like that big of a deal either - eventually the daemon is just going to mark it as a falseticker and life goes on.

Everywhere Google documents the service, they clearly state you should not mix their smearing NTP servers with non-smearing NTP servers.
I'm not convinced it would do anything harmful, so long as you have enough NTP sources (which you should have anyway).

From their FAQ:

> We recommend that you do not mix smeared and non-smeared NTP servers. The results during a leap second may be unpredictable.

I read that as a soft SHOULD NOT, not MUST NOT. Would be a fun exercise to try doing it intentionally with common NTP implementations and see what happens.

first example: ok, yes, if they offer the DB as a service, that would be bad.

If you run it on a VM it's IMHO your responsibility to make sure your time sensitive database nodes have shared time.

Same for the second example, interesting point for SaaS scenario, although it seems like that could break through normal deviations already.

EDIT: ok, the blog post actually mentions "local clocks in sync with VM instances running on Google Compute Engine", my bad. Not sure what to think about that. In comparison, Amazon recommends running NTP on your VMs and their Linux AMIs come with pool.ntp.org configured as default. </edit>

Third: It's going to figure out some solution (if Google is only one source it's probably going to drop it as faulty), but you probably should not have added a time source that's officially documented to not strictly follow standards. It's not like Google offered a NTP service for years and now suddenly switched how it works.

I guess I underestimate the amount of trust people put into random time sources: practice is probably messier than theory.