Hacker News new | ask | show | jobs
by amluto 3612 days ago
They added a feature that impressively fails to interoperate with the rest of the world.

> Added well-known type protos (any.proto, empty.proto, timestamp.proto, duration.proto, etc.). Users can import and use these protos just like regular proto files. Additional runtime support are available for each language.

From timestamp.proto:

  // A Timestamp represents a point in time independent of any time zone
  // or calendar, represented as seconds and fractions of seconds at
  // nanosecond resolution in UTC Epoch time. It is encoded using the
  // Proleptic Gregorian Calendar which extends the Gregorian calendar
  // backwards to year one. It is encoded assuming all minutes are 60
  // seconds long, i.e. leap seconds are "smeared" so that no leap second
  // table is needed for interpretation.
Nice, sort of -- all UTC times are representable. But you can't display the time in normal human-readable form without a leap-second table, and even their sample code is wrong is almost all cases:

  //     struct timeval tv;
  //     gettimeofday(&tv, NULL);
  //
  //     Timestamp timestamp;
  //     timestamp.set_seconds(tv.tv_sec);
  //     timestamp.set_nanos(tv.tv_usec * 1000);
That's only right if you run your computer in Google time. And, damn it, Google time leaked out into public NTP the last time their was a leap second, breaking all kinds of things.

Sticking one's head in the sand and pretending there are no leap seconds is one thing, but designing a protocol that breaks interoperability with people who don't bury their heads in the sand is another thing entirely.

Edit: fixed formatting

12 comments

It's interesting that you refer to a huge amount of planning and engineering as "sticking your head in the sand".

https://googleblog.blogspot.com/2011/09/time-technology-and-...

I think that the approach everything else uses is the "sticking your head in the sand approach". You basically pretend that there is no problem and that time is perfectly accurate, up until you have a minute with 59 or 61 seconds.

Just because suddenly trying to handle "Oh shit, everything is off by an entire second!" is the approach everything else uses doesn't mean it is the right approach.

No, I agree they did a bunch of good engineering for internal use.

But they didn't keep it internal properly -- the real world has leap seconds for better or for worse, and this library really does stick its head in the sand and pretend they don't exist. Google specifically says that this library is designed to be "the foundation of Google's new API platform". Yet they give a data type (as a headline feature) and a sample usage that is simply incorrect if you don't set your system to work using Google's "leap smear". It also seems quite likely that it'll result in blatantly wrong human-readable strings. I'll even quote a string from timestamp.proto [1]:

9999-12-31T23:59:59Z

That looks like an RFC 3339 string, and it even has the 'Z' suffix, which means it's UTC, which has an agreed-upon international definition. But this is not a valid UTC time. It's a time in a different time zone that Google made up.

Google easily could have done better: publish a spec for a different kind of time like:

9999-12-31T23:59:59s

where the little 's' means 'smeared'. Supply a serializer and deserializer for that. Now there's no ambiguity.

[1] https://github.com/google/protobuf/blob/master/src/google/pr...

>You basically pretend that there is no problem and that time is perfectly accurate, up until you have a minute with 59 or 61 seconds.

Time is perfectly accurate, including all the minutes with 59 or 61 seconds. UTC is perfectly defined as atomic time (TAI) with an offset to keep it within 0.9 seconds of UT1 (time as measured by the rotation of the earth). Every time we increment or decrement this offset, this leads to leap seconds. But since 23:59:60 is a valid time (and distinct from 00:00:00 on days with leap seconds), there is no ambiguity here.

The problem here is how most computers handle this: introducing ambiguity by setting the clock backwards or forwards one second, instead of accounting for the fact that not all minutes have 60 seconds. Google did a pragmatic fix for their use case by squeezing leap seconds into the surrounding seconds, stretching them. It works for them, but now their "seconds" are not actual seconds anymore.

It's fine as a timestamp implementation, and great for many uses. But I think a big problem with the documentation. They start off by saying it's "at nanosecond resolution in UTC Epoch time", and then they go on to explain how it uses a completely different encoding that is neither compatible with UTC nor with TAI (atomic time which ignores leap seconds). And then they jump ahead to sample code which again pretends that the timestamp is UTC.

No matter whether you like "google time" or not, this is horrible documentation. They are glossing over an issue which should be marked with big red letters.

The question of how to reconcile leap-second-smearing systems with other systems is an interesting and important one. I'm not sure that timestamp.proto changes this issue: prior to timestamp.proto systems would still communicate using UNIX time (smeared or non-smeared) using plain integer or double seconds. timestamp.proto just provides a structure for storing UNIX time with greater range and precision than a single integer or floating point number can provide.

What I'm trying to say is that I think this is a smearing systems vs. non-smearing systems issue, and not so much a timestamp.proto issue. timestamp.proto mentions smearing but really it's just a vehicle for storing the seconds/nanos from the system clock, with whatever semantics that system clock uses. Because in practice systems don't give you access to both the smeared and non-smeared values; you get whatever the system gives you. The remarks about being leap-second-ignorant apply whether the leap second is being smeared or repeated.

Google implemented leap-second smearing in 2011, before the big push towards cloud. So the need to communicate sub-second timestamps between internal Google systems and external systems was probably not so much on people's minds. But these days we're releasing a bunch of APIs, and sub-second timestamps might become a more important issue for some of them.

So I think this issue is worth discussing further, and I opened an issue on GitHub to track it: https://github.com/google/protobuf/issues/1890

Thanks for the feedback.

This is only an issue if you use the Timestamp to represent a human-readable time. There are more uses for timestamping than for display to a human operator. For example, one might use a timestamp in a software system to detect the passage of time, as in the use of a monotonic clock. In a real-time system you would ignore the presence of leap seconds because you will never examine the timing of your system relative to a Gregorian calendar. Rather, you just want to make sure that the station-keeping engine on your satellite burns for exactly 250 milliseconds, and leap seconds are of no use in that application.
If you use Google's timestamp type to burn for 250ms, you might end up with 250*86401/86400 ms. That's not a fantastic outcome.
I think you have it exactly backwards, if I understand things correctly.

It _seems_ like their "UTC Epoch time" is the same thing as POSIX time, but the Google engineer's terminology is all fubar. The reliance on the Proleptic Gregorian Calendar is further proof as that's a reference to a specific algorithm for calculating calendar dates.

POSIX time says that there are precisely 86400 "seconds" per day, which I think implies the same thing as saying there being precisely 60 "seconds" per minute. The logical consequence is, of course, that in neither case is "second" referring to the SI second.

Once you get over the fact that we're discussing different units of time, then you can see that POSIX time is _perfect_ for recording and manipulating civil calendar time. For the purposes of calendar manipulation, you rarely if ever need to know elapsed time in SI-unit seconds. All you care about is easily calculating past and _future_ calendar information. Your power company and credit card companies don't bill you by SI seconds, they bill you by the hour, day, week, or month.

Conversely, in those situations where you want accurate and precise SI-second measurements, you rarely if ever want to convert or display that data in terms of calendar time. When SpaceX sends a rocket into space, the view screen shows elapsed seconds since launch, not elapsed seconds since lunch. That's a big difference.

Interestingly, in neither case do leap seconds matter! They're irrelevant. Leaps second play no part in either TAI or POSIX time.

There are some cases where you want both pieces of information, but I think it's usually a mistake to conflate them and try to shoehorn them into the same units. That misguided practice is behind all the anxiety about leap seconds in UTC time.

It's also worth noting that as clocks become increasingly precise and accurate that the whole leap second thing will fade away. UTC time is based on the fiction that there's an abstract, universal clock in the world that is measurable in SI seconds. There isn't. At some point the needs of routine industrial measurements will enter the realm where relativity governs, at which point the fiction will be laid bare. Calendar time, of course, doesn't rely on that fiction.

The move to uncouple civil time from solar time is totally misguided, IMO, and only exacerbates the improper way that software engineers conflate the purpose and function of various time measurements.

I've always used uint64's for that. Why would you need a distinct type.
It's a serialization format containing seconds and microseconds. You can put whatever you want in there, including true (non-Google) UTC time, right? This seems more like a documentation problem than an actual problem with Protobuf.
It saddens me that this is the top comment. It's complete and total FUD unrelated in any way to what Proto is, and to boot, it's an optional type, provided if you want it, but otherwise not forced to be used in any way! Scroll down the page for much more worthwhile discussions of Proto.
A tangent, perhaps. But it's not FUD.
There's really no reason you can't provide your own timestamp structure, or your own timestamp transformation logic...
I'm glad they're willing to break compatibility to push their approach, because I think it's a better one. UTC with leap seconds is the worst of all possible worlds - not suitable for human time, not suitable for system time either - as perennial leap second bugs in such high-profile projects as the linux kernel demonstrate. Everyone seems to have agreed for years that basing system time on something without leap seconds would be better - whether that be leap smears or TAI - but no-one bothers to take action.
Regarding the leaking of NTP, are you talking about Systemd's default pointing at Google's NTP servers or some other event?
> designing a protocol

It's not a full protocol. It's a data type for a serialization library. You can write your own data types and they serialize just as well as the built-in types.

> that breaks interoperability

Wait, what was "broken" here? What was working before that isn't with this new release? What does this inclusion of a utility data type in a serialization library break that previously was intact?

Does this depend on use of Google's time servers?

The dependence on "smeared" leap seconds sure sounds like a dependence on such a time server.

Ouch.

I can see caring about leap seconds right now, but a few seconds back or forth in the past probably won't matter very much.