Hacker News new | ask | show | jobs
by cbsmith 3401 days ago
> System calls in Linux are really fast. So saving "thousands" of system calls when /etc/localtime is in cache doesn't actually save that much actual CPU time.

"fast" is a relative term, and is somewhat orthogonal to "efficient".

There's a reason why certain functions use a vDSO. If you're just going to use a syscall anyway, there's kind of no point.

1 comments

You're assuming that all cases where the vDSO call is made gets paired with a real syscall; that's simply not the case. There are plenty of calls in a server that won't need localtime (basically, anything that just needs the current time in UTC: best-practice code should not be looking at the machine's TZ setting¹). Look at the examples the article's author offers:

> formatting dates and times

This shouldn't require a call to localtime; more explanation on the part of the article is required here. Breaking a seconds-since-epoch out into year/mo/day/etc. is "simple" math, and shouldn't require a filesystem access. Something else is amiss here.

> for everything from log messages

You're about to hit disk; a cache'd stat() isn't going to matter.

> to SQL queries.

You're about to hit the network; a cache'd stat() isn't going to matter.

(Now, I'm not saying you shouldn't set TZ; if it saves some syscalls, fine, and it might be the only sane value anyways.)

¹one of my old teams had an informal rule that any invocation of datetime.datetime.now() was a bug.

> You're assuming that all cases where the vDSO call is made gets paired with a real syscall; that's simply not the case.

I don't believe I was. I was merely assuming that there a lot of cases (as in, potentially thousands of times a second) where code needs the system time they also want the localtime.

> There are plenty of calls in a server that won't need localtime (basically, anything that just needs the current time in UTC: best-practice code should not be looking at the machine's TZ setting¹)

As the article demonstrates, whatever we might believe about best practice, actual practice seems to include a lot of cases where it is called.

Given that a given epoch time value can map to different dates & times, depending on timezone... I'm not sure why you think formatting dates & times wouldn't require considering the desired timezone.

You're similarly mistaken that logging a message involves hitting disk. It's a very common configuration for high throughput logs to buffer writing to disk across multiple messages and/or forward to a remote server.

Similarly SQL queries don't necessarily involve hitting the network (some don't even involve crossing an IPC boundary). Even if you do hit the network, once again, it is very common for multiple network requests to be buffered in user space before making a syscall, and of course a single SQL statement could involve more than one localized timestamp value (though I'd like to think in that case the the local timezone would be cached).

> ¹one of my old teams had an informal rule that any invocation of datetime.datetime.now() was a bug.

Well, if you are writing in Python, then worrying about the syscall overhead of reading the local timezone would seem odd (and for that matter, Python does some odd things with timezones, so I'm not even sure this would reliably trigger the syscall).

>Breaking a seconds-since-epoch out into year/mo/day/etc. is "simple" math, and shouldn't require a filesystem access.

To do it simply yes, but not correctly. See the "Falsehoods programmers believe about time" series.

http://infiniteundo.com/post/25326999628/falsehoods-programm... http://infiniteundo.com/post/25509354022/more-falsehoods-pro...

> To do it simply yes, but not correctly.

No, it do it correctly doesn't require filesystem access either. I've read both articles in the past: neither refutes the point I made above. If I were incorrect, linking to an article that enumerates tens of things (some of them arguably incorrect) isn't useful.

If you're trying to imply that you need to take timezones into account, yes, you do. Yes, typically those definitions are stored on disk, but the context here is requiring filesystem access each and every time; most libraries (including glibc) will load the timezone definitions once, and keep them in memory. Thus, you can break a seconds-since-epoch out into year/mo/day/etc. with "simple" math, and it doesn't require a filesystem access. (Beyond the amortized one time load, but given the point and purposes of the article, I'm not considering that.)

Read the damn article. It explains how it's localtime (the function you need to format time in user's time zone) that makes the stat call - to check if the ocnfigured time zone changed.
> Read the damn article.

> Please don't insinuate that someone hasn't read an article

I read the article. Yes, localtime requires the call; that wasn't my point. My point was that for plenty of common, server-side code, either this isn't required, or is inconsequential.

The former case that I was consider is the formatting of timestamps into TZs in the context of a request being server by a server. Most server-side TZ conversions I've ever needed can't call localtime, b/c localtime is wired to not the user's timezone, but the TZ of the machine the server's code is running on, which is typically either nonsense, UTC, or whatever the devs like. Server side code needs (of course, YMMV) to use the user's TZ, whatever that may be, so I'm making calls to a library built for that, e.g., pytz, which doesn't need to stat() that the machines TZ as there is no point to doing so.

The other instances the author lists that do require localtime are instances where localtime's stat call is the least of your worries, as you're about to perform other operations that are much more expensive.

Timezones don't exclusively belong to users... Most syslogs (up until systemd) are configured to write out logs in machine localized time. Same goes for web servers. Really, there are a ton of cases where servers need to consider their timezone. I don't much like it, but it nevertheless is true.
Sounds like a misunderstanding of "simple" :)
Hence the scare quotes around simple. The math is in no way straightforward, but it's nonetheless math, esp. once you have the TZ information (if required) in front of you. The point was that there are plenty of operations within a typical server-side codebase that either involve little-to-no syscalls (tagging a record with the current UTC time, or converting a UTC timestamp to an ISO formatted date and time for serialization on the wire, e.g., JSON) or are forced to hit really expensive syscalls, rendering a quite-likely-cached-in-RAM stat() moot (logging, SQL queries).