Hacker News new | ask | show | jobs
by tytso 3401 days ago
System calls in Linux are really fast. So saving "thousands" of system calls when /etc/localtime is in cache doesn't actually save that much actual CPU time.

I ran an experiment where I timed the runtime of the sample program provided in the OP, except I changed the number of calls to localtime() from ten times to a million. I then timed the difference with and without export TZ=:/etc/localhost. The net savings was .6 seconds. So for a single call to localtime(3), the net savings is 0.6 microseconds.

That's non-zero, but it's likely in the noise compared to everything else that your program might be doing.

9 comments

> System calls in Linux are really fast. So saving "thousands" of system calls when /etc/localtime is in cache doesn't actually save that much actual CPU time.

"fast" is a relative term, and is somewhat orthogonal to "efficient".

There's a reason why certain functions use a vDSO. If you're just going to use a syscall anyway, there's kind of no point.

You're assuming that all cases where the vDSO call is made gets paired with a real syscall; that's simply not the case. There are plenty of calls in a server that won't need localtime (basically, anything that just needs the current time in UTC: best-practice code should not be looking at the machine's TZ setting¹). Look at the examples the article's author offers:

> formatting dates and times

This shouldn't require a call to localtime; more explanation on the part of the article is required here. Breaking a seconds-since-epoch out into year/mo/day/etc. is "simple" math, and shouldn't require a filesystem access. Something else is amiss here.

> for everything from log messages

You're about to hit disk; a cache'd stat() isn't going to matter.

> to SQL queries.

You're about to hit the network; a cache'd stat() isn't going to matter.

(Now, I'm not saying you shouldn't set TZ; if it saves some syscalls, fine, and it might be the only sane value anyways.)

¹one of my old teams had an informal rule that any invocation of datetime.datetime.now() was a bug.

> You're assuming that all cases where the vDSO call is made gets paired with a real syscall; that's simply not the case.

I don't believe I was. I was merely assuming that there a lot of cases (as in, potentially thousands of times a second) where code needs the system time they also want the localtime.

> There are plenty of calls in a server that won't need localtime (basically, anything that just needs the current time in UTC: best-practice code should not be looking at the machine's TZ setting¹)

As the article demonstrates, whatever we might believe about best practice, actual practice seems to include a lot of cases where it is called.

Given that a given epoch time value can map to different dates & times, depending on timezone... I'm not sure why you think formatting dates & times wouldn't require considering the desired timezone.

You're similarly mistaken that logging a message involves hitting disk. It's a very common configuration for high throughput logs to buffer writing to disk across multiple messages and/or forward to a remote server.

Similarly SQL queries don't necessarily involve hitting the network (some don't even involve crossing an IPC boundary). Even if you do hit the network, once again, it is very common for multiple network requests to be buffered in user space before making a syscall, and of course a single SQL statement could involve more than one localized timestamp value (though I'd like to think in that case the the local timezone would be cached).

> ¹one of my old teams had an informal rule that any invocation of datetime.datetime.now() was a bug.

Well, if you are writing in Python, then worrying about the syscall overhead of reading the local timezone would seem odd (and for that matter, Python does some odd things with timezones, so I'm not even sure this would reliably trigger the syscall).

>Breaking a seconds-since-epoch out into year/mo/day/etc. is "simple" math, and shouldn't require a filesystem access.

To do it simply yes, but not correctly. See the "Falsehoods programmers believe about time" series.

http://infiniteundo.com/post/25326999628/falsehoods-programm... http://infiniteundo.com/post/25509354022/more-falsehoods-pro...

> To do it simply yes, but not correctly.

No, it do it correctly doesn't require filesystem access either. I've read both articles in the past: neither refutes the point I made above. If I were incorrect, linking to an article that enumerates tens of things (some of them arguably incorrect) isn't useful.

If you're trying to imply that you need to take timezones into account, yes, you do. Yes, typically those definitions are stored on disk, but the context here is requiring filesystem access each and every time; most libraries (including glibc) will load the timezone definitions once, and keep them in memory. Thus, you can break a seconds-since-epoch out into year/mo/day/etc. with "simple" math, and it doesn't require a filesystem access. (Beyond the amortized one time load, but given the point and purposes of the article, I'm not considering that.)

Read the damn article. It explains how it's localtime (the function you need to format time in user's time zone) that makes the stat call - to check if the ocnfigured time zone changed.
> Read the damn article.

> Please don't insinuate that someone hasn't read an article

I read the article. Yes, localtime requires the call; that wasn't my point. My point was that for plenty of common, server-side code, either this isn't required, or is inconsequential.

The former case that I was consider is the formatting of timestamps into TZs in the context of a request being server by a server. Most server-side TZ conversions I've ever needed can't call localtime, b/c localtime is wired to not the user's timezone, but the TZ of the machine the server's code is running on, which is typically either nonsense, UTC, or whatever the devs like. Server side code needs (of course, YMMV) to use the user's TZ, whatever that may be, so I'm making calls to a library built for that, e.g., pytz, which doesn't need to stat() that the machines TZ as there is no point to doing so.

The other instances the author lists that do require localtime are instances where localtime's stat call is the least of your worries, as you're about to perform other operations that are much more expensive.

Sounds like a misunderstanding of "simple" :)
Hence the scare quotes around simple. The math is in no way straightforward, but it's nonetheless math, esp. once you have the TZ information (if required) in front of you. The point was that there are plenty of operations within a typical server-side codebase that either involve little-to-no syscalls (tagging a record with the current UTC time, or converting a UTC timestamp to an ISO formatted date and time for serialization on the wire, e.g., JSON) or are forced to hit really expensive syscalls, rendering a quite-likely-cached-in-RAM stat() moot (logging, SQL queries).
On your base system, yes. Lots of things can hook random syscalls, or environments might have syscall monitoring.

One example is the folks over at slack record every syscall for security auditing. https://slack.engineering/syscall-auditing-at-scale-e6a3ca8a...

Slack uses the Linux audit subsystem which is also certainly faster than you think it is. Consider how many system calls your typical application is issuing --- especially ones that are likely to be calling localtime() all the time, such as a web server. If system call auditing had that high of an overhead, everything would be horrifically slow --- but it isn't, because Linux audit sends its records out asynchronously and in batches.
https://www.redhat.com/archives/linux-audit/2015-January/msg...

of course this is RHEL 2.6.32 and it's open/close but 200000 sc/s vs 3000 sc/s shows it has some overhead. Maybe someone can rerun that test code on git and see what the overhead is.

This might be true for your system and libc, where the system calls make use of things like vDSO for gettimeofday go fast, but in general this isn't guaranteed at all. Even on x64, for certain libc implementations, like musl, if I recall correctly, syscalls are made the old fashioned way by trapping 0x80, which would mean you would see a much bigger effect by reducing the number of syscalls.
There is no vDSO for calls to stat(2). The claim in the article was that by setting the TZ environment variable to ":/etc/localtime", one could save "thousands" of stat system calls. Even for old-fashioned system calls where you use trap 0x80, Linux is still amazingly fast.

This can actually be a problem, since there are applications like git which assume stat is fast, and so it aggressively stat's all of the working files in the repository to check the mod times to see if anything has changed. That's fine on Linux, but it's a disaster on Windows, where the stat system call is dog-slow. Still, I'd call that a Windows bug, not a git bug.

Does Windows has stat() call? It is probably a function from some POSIX emulation layer and maybe that is why it is not fast.
It's also a disaster on NFS.
Not quite. On x86_32, for complicated and ultimately ridiculous but nevertheless valid reasons, lots of syscalls on musl use int $0x80. I have a patch to make this fixable but Linus shot it down. Maybe I should try again.

On x86_64, syscalls only use SYSCALL. It's very fast if audit and such are off and reasonably fast otherwise. (I extensively rewrote this code recently. Older teardowns of the syscall path are dated.)

System calls in x86 are fast. Other archs behave differently. And the syscall time is not the only thing that matters, but potentially yielding execution
I thought they were fast because x86 has multiple register files, enough for kernel space and user space to have their own, so that entry/exit to system calls doesn't require flushing registers to L1 (in the common case).

If that's true, then one test where you have a single process spinning into and out of a single syscall will have very different performance characteristics than a test where you have more processes than processor cores, because context switches flush the TLB.

Somebody who knows actual things about x86 and so forth please tell me if I'm spouting 90s-era comp sci architecture textbook stuff that no longer applies.

They're fast because x86 has a decently fast privilege change mechanism for system calls and Linux works fairly hard to avoid doing unnecessary work to handle them. In the simplest case, registers are saved, a function is called, regs are restored, and the kernel switches back to user mode.

The asm code is fairly straightforward in Linux these days. I'm proud of it. :)

Check out the post linked from the article: https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g... to learn more about how system calls work on x86 Linux.
I did the same experiment on a Raspberry Pi 2. The net saving was 5.803 seconds, so 5.803 microseconds per call.

Obviously if you care about performance then you wouldn't be running your program on a Raspberry Pi in the first place. But for everything else there's this free speed up.

I build a bunch of home automation stuff (as a hobby) using Pis and other microcontrollers. Performance in those things translates almost directly to power savings, and is very desirable.

OTOH, I've never encountered an issue like this on those systems.. (yet)

System calls in Linux are not faster than not doing them.
Yeah, this is a perfect example of micro-optimization being unnecessary. Not only will you not see performance issues from this in the real world, it might cause problems down the road, because since it isn't set by default this way, some apps may not expect it and behave erroneously.

But it's neat information to have in the back of your head.

Unnecessary? I had a really bad experience with ancient skype version on modern Ubuntu desktop, and the fix for this was to set TZ environment variable to speedup first login/history fetch. Skype process was spending so much time doing useless work it was noticeable.
That's just not possible to authoritatively state. The best you can do is "this shouldn't normally cause a noticeable impact on most systems".

As just one example, what you're stat()ing over NFS with a busy, flaky and/or distant server? A bit of thought and you'll come up with a bunch of other times it suddenly starts to matter.

I did the same, but with 10M iterations:

    $ time ./tz     
    ./tz  2,24s user 6,28s system 98% cpu 8,612 total
    $ export TZ=:/etc/localtime
    $ time ./tz                
    ./tz  1,35s user 0,00s system 98% cpu 1,364 total
So 0.7 microseconds on my machine.
> TZ=:/etc/localhost

Hope this is just a typo in your comment, not the actual test ;)

This isn't a typo, but is part of the syntax used by the TZ variable. (The same format appears in the article itself.)

See `man timezone` on a Linux system[1]. Specifically, see the passage that I've quoted below. Note that this is the third of three different formats that the man page describes that you can use in TZ:

> The second format specifies that the timezone information should be read from a file:

    :[filespec]
> *If the file specification filespec is omitted, or its value cannot be interpreted, then Coordinated Universal Time (UTC) is used. If filespec is given, it specifies another tzfile(5)-format file to read the timezone information from. If filespec does not begin with a '/', the file specification is relative to the system timezone directory. If the colon is omitted each of the above TZ formats will be tried.

[1]: https://linux.die.net/man/3/timezone

Sure, but at least on none of my Linux systems there is no such file /etc/localhost. I think the parent was referring to /etc/localtime. Not sure what is the behaviour if non-existent file is specified - perhaps the "value cannot be interpreted" case applies, but it's not pefectly clear, since it could be argued that the value is valid, just refers to a non-existent file.
Ah, correct you are! :-) I had missed that myself, and the : syntax is so rarely seen I naturally assumed that was what was intended.