Hacker News new | ask | show | jobs
by roelb 1665 days ago
Regarding time and timezones, it's best to not prematurely optimise. Ideally store both. It's much more valuable to track the timezone from your source then to have to worry about it later. ISO8601 is the universal standard.

A case where you see this issue play out is the horrible Strava-*device sync. You track in a different timezone and they store+visualise activities against some weird client-side profile setting, which causes morning runs to render at 11pm. Only adding timezone later on. They totally mess this up.

1 comments

First of all, I don't see how using unix epoch timestamps can be called "premature optimization", it's a pretty widely used and standardized way of saving timestamps.

Secondly, if you're in "SQLite Strict Land" and you don't have access to abstractions like "tell the database this is a date", then the best way of storing timestamps is unix epoch, I would be extremely surprised if databases don't already do this behind the scenes when abstracting away things like dates for the user.

Thirdly, what is the value of tracking "source timezone"? This is solving a non-existent problem: if you are getting the timestamp in unix epoch from the source, and you're storing it in unix epoch, the "source timezone" is already known: it's UTC just like all unix timestamps are.

Timezones is fundamentally a data presentation concern, and I strongly believe they should not be a part of the source data.

> You track in a different timezone and they store+visualise activities against some weird client-side profile setting, which causes morning runs to render at 11pm. Only adding timezone later on. They totally mess this up.

This is exactly the type of issue that happens because you involve timezones in your source data. If all applications and databases only concern themselves with unix timestamps, and the conversion to a specific timezone only happens in the application layer upon display, this type of issue simply does not happen, because time "1638097466" when I'm writing this is the exact same time everywhere on the globe.

(Of course, similar issues can happen due to user error if two applications have different time zone settings and the user mistakenly enter a timestamp in the wrong TZ, but that's definitely not solved by making time zones a part of the data itself)

I agree that this is the preferred way of dealing with it. Unfortunately, it's not always possible. Some cases that come to mind:

- Importing event/action data that contains date/time values with a timezone but insufficient information on the place where the event occurred. Converting to UTC and throwing away the timezone means you're losing information. You can no longer recover the local time of the event. You can no longer answer questions like, did this happen in broad daylight or in the dark?

- Importing local date/time data without a timezone or place (I've seen lots of financial records like this). In this case, you simply don't know where on the timeline the event actually took place. The best you can do is store the local date/time info as is. You can't even sort it with date/time data from other sources. It's bad data but throwing it away or incorrectly converting it to UTC may be worse.

- Calendar/reminder/alarm clock kind of apps. You don't want to set your alarm to 7 am, travel to another timezone and have the alarm blare at you at 4 am. Sometimes you really do want local time.

- There are other cases where local times are not strictly necessary but so much more convenient. Think shop opening hours in a mapping app for instance. You don't want to update all opening hours every time something timezone related happens, such as the beginning or end of daylight saving time.

You are correct, there are many other reasons for saving dates and times in a database apart from recording "events", and for those it does make sense to use "relative" descriptions of time or date. I'd argue though, that if your data collection of events is imperfect, like if you have no idea which system an event came from or whether you can trust the syntax of the timestamps, those are primary problems that should be fixed and not "worked around" by changing how your timestamps are saved.

For example, if you don't know where a data point originated, that's already a pretty big issue regardless of whether the time is correct or not. If you have financial data with ambiguous timestamps, this is not only a problem but potentially a compliance problem, since banks are heavily regulated. I think it's unlikely that it's acceptable for a bank to be unable to answer the question "when did this event take place", so the fundamental issue should be fixed, not tolerated.

You're not always in a position to fix these issues, nor is it always worth the effort. It may be a complete side-issue that no one cares about. The information may exist somewhere but not in the dataset you're working with.

Having done quite a bit of data integration work in my life, I can tell you that fixing things can range from the unavoidable to the quixotic.

Another use case is the definition of a business day.

A business day is almost always in local time. Example: business is open from 9AM until 6PM. This is affected by day light saving and it moves back and forth in UTC.

Working with financial systems, payments are business day aligned and if you try to have a system with UTC time you might be surprised what sort of issues are surface. I learned these things the hard way. :)

That only applies if what you're storing identifies a point in time. In business code we often have to work with date times in a specific timezone. e.g. if you schedule a meeting on 2022-07-01 09:00 in Berlin time, throwing away the timezone would be a mistake which results in incorrect behaviour when the DST rules change.
> That only applies if what you're storing identifies a point in time.

A meeting is definitely "a point in time", and your example illustrates my point: the display of "2022-07-01 09:00 UTC+2" should be up to the client (which has a timezone setting) but the storage of this meeting's time should, in my opinion, not be "2022-07-01 09:00 UTC+2" but the timezone-neutral "1656658800".

Since there is technically no such thing as "Berlin time", it should be up to the client to decide whether the date you selected in the future (July 1, 2022) is UTC+1 (winter) or UTC+2 (summer) and then store the time in a neutral way to ensure it's then displayed correctly in other clients that have another timezone set (for example the crazy Indian UTC+5:30).

Meetings are a good example of when this matters because it's important that meetings happen at the same time for everyone, which will always happen when the source data is unix epoch. Of course it also happens with correct time zone conversions, but to me that adds unnecessary complexity. Other examples given, like the opening hours of a store, are probably better to store in a "relative" way (09:00, Europe/Berlin) as the second part of that tuple will vary between UTC+1 and UTC+2.