Hacker News new | ask | show | jobs
by pitdicker 948 days ago
This also caused a lot of trouble for time libraries in Rust. The two foundational libraries, chrono and time, rely on localtime_r to get the local time instead of the clock value in UTC. localtime_r reads the TZ environment variable (and optionally others like TZ_DIR). Rust declares it safe to modify the environment, while POSIX declares it unsafe.

CVE-2020-26235, RUSTSEC-2020-0071 and RUSTSEC-2020-0159 where opened against the crates. That left the Rust ecosystem with a pretty much unsolvable issue for many months. Chrono went with the solution to parse the timezone database of the OS natively and read the environment using the Rust locks. Time tries to detect if the libc version has thread-safety guarantees to access the environment, and otherwise panics if there are multiple threads.

More reading: https://docs.rs/chrono/latest/chrono/#security-advisories

5 comments

To be exact, it was Chrono and time-rs 0.1, while time-rs 0.2 and later was rewritten from the scratch and didn't have that issue... because the new time-rs didn't yet support general time zones other than fixed offsets. The accepted solution for Chrono surprised me a lot, because as far as I reckon it was the hardest solution. (Disclaimer: I'm the original author of Chrono.)

But a bad API design doesn't end at environment variables. Many POSIX systems rely on `/etc/localtime` to define the system-wide time zone, and every `localtime` call has to check if the file has been changed or not because there is no way to subscribe to the system-wide time zone change event. Of course there is a cache, but many libcs call at least `stat` per each `localtime` call AFAIK. I had even experienced a possible glibc bug due to the lack of guard against I/O error during this process [1]. Windows got this right, I can't see why POSIX couldn't do the same when it does have an asynchronous signal delivery mechanism anyway.

[1] https://news.ycombinator.com/item?id=9953898

Unix was "designed" (if you can call it that) a long time before it was possible to move a running system between timezones. So many of these decisions were made in completely different circumstances (I almost said environment) and are laying around like old WWII bombs just waiting for someone to dig one up.
Presumably, GNU Hurd will fix these issues without introducing fun new ones.
Poe's law; I can't tell if this is a joke. I was under the impression that Hurd is not going to be ready for usage any time soon (a while).
That is the joke.
My respects for your work on Chrono!

And you are right about time-rs (or I think you are). Version 0.1 was never fixed, and version 0.3 does the OS and thread count checks.

It does have some advantage for chrono to do everything in Rust: it can now return two results for ambiguous local time during DST transition fold, and properly return None during a transition gap.

> My respects for your work on Chrono!

Thank you. To be frank as a first-time maintainer I did a mediocre job---my biggest regret for Chrono is that I did know most forthcoming issues beforehand and yet didn't take enough time to make them public and explicit so that someone else could prepare for the future.

> Many POSIX systems rely on `/etc/localtime` to define the system-wide time zone, and every `localtime` call has to check if the file has been changed or not because there is no way to subscribe to the system-wide time zone change event.

But you can subscribe to file change events so why not do that?

I did seriously consider inotify back in time, but in order to take advantage of inotify I had to parse all binary TZif files (because otherwise I still had to call `localtime` that would `stat` every time anyway). It was so cumbersome, that was only halfway finished when I stepped down as a maintainer. Hence my surprise when I learned that someone actually did implement all of them.
Offhand, and a quick google search; I was unable to find the exact definition / specification for how time-zone data must be obtained rather than how it happens to conventionally be obtained.

It is entirely reasonable that any of the following _might_ be valid behavior.

* Simple but syscall heavy approach which re-reads the env, and possibly /etc/localtime each call and has no stability. (Results may mutate as other processes / threads change things.)

* Same as above, then caches the decision result for some application specific reasonable time; which may be until the application exits.

* The elsewhere mentioned stat / inotify approaches that only track updates to /etc/localtime (and ideally update the cached decision result when notified).

All approaches seem valid. It's sort of like the hostname or any other system level configuration where a reboot may be a reasonable expectation for a complete update.

That was also my thought. To my knowledge `/etc/localtime` is the creation of Arthur David Olson, the founder of the tz database (now maintained by IANA), but his code never read `/etc/localtime` multiple times unless `TZ` environment variable was changed. Tzcode made into glibc but Ulrich Drepper changed it to not cache `/etc/localtime` when `TZ` is unset [1]; I wasn't able to locate the exact rationale, given that the commit was very ancient (1996-12) and no mailing list archive is available for this time period.

[1] https://github.com/bminor/glibc/commit/68dbb3a69e78e24a778c6...

I believe systemd has a way to subscribe to timezone changes.
> Rust declares it safe to modify the environment, while POSIX declares it unsafe.

Arguably, Rust declares it is safe to modify the environment through its stdlib methods. The tricky detail is that this means it is unsafe to read/modify the environment through other means, but sometimes this is really hard to avoid.

Does rust also add an pthread_atfork handler? Otherwise, it seems likely still unsafe for rust to claim to support calling fork (for execv) or posix_spawn, as most libc call realloc on the `environ` contents, but do not appear to take any care to ensure that (v)fork/posix_spawn doesn't happen concurrently with that. Worse yet, the `posix_spawnp` API takes an `envp` parameter and expects you to pass it the global pointer `environ`, which is completely unsynchronized across that fork call. It is not obvious to me that this is a security gap, but certainly it seems to me that this would violate rust's safety claim, if it is not taking added precautions there.

The Apple Libc appears to just unconditionally drops the environ lock in the child (https://github.com/apple-oss-distributions/Libc/blob/c5a3293...), while glibc doesn't appear to even bother with that (https://github.com/bminor/glibc/blob/6ae7b5f43d4b13f24606d71...)

I don't think Rust's stdlib provides any kind of safe way to call just fork(), it only has methods for creating child processes because that's the only interface that works on every supported Tier 1 platform. Calling fork is always going to necessarily be an unsafe{} libc call or syscall, and the caller will have to take care to ensure nothing funny is going on.
There are OS specific APIs where needed, probably also for threads.
`std::os::unix does` adds some additional methods in that vein like exec(), but no fork(). `std::os::linux` only adds the ability to get `pidfd`s for child processes you create. There's simply no safe way for the stdlib to provide safe fork() without knowing a lot of things about how you're going to set up your process and what other libraries you might pull in that may not be fork-safe. If you're willing to ensure you only call it in a safe way, you can still call fork, the language just cannot guarantee it will be safe, same as when you're doing it in C.
> The tricky detail is that this means it is unsafe to read/modify the environment through other means, but sometimes this is really hard to avoid.

If you have C and Rust in the same process and C code calls setenv(3), for one ...

Edit: why downvotes? It's very typical to link to C libraries which may call the libc environment stuff ... My point is you can't control library code as easily, if it's some dependency of a dependency eventually calling libc.

If you are modifying TZ while another thread is relying on it to calculate time, those threads are racing, and hiding the crash won't solve the race: the reading thread will now randomly return values in the wrong timezone instead, subsequent code will use it in whatever operation it is it wanted the time for, the end result will be garbage, and this will be super hard to debug because there won't be a loud obvious crash pointing to the root cause and also depending on the winner of the race the symptoms will be random/intermittent.

Fix the high level race, and suddenly you no longer need the low level mutex.

> the reading thread will now randomly return values in the wrong timezone instead, subsequent code will use it in whatever operation it is it wanted the time for, the end result will be garbage,

I really strongly disagree with how bad you seem to think this is. If you are designing your application to use the timezone and modify it at the same time, it is a totally natural consequence that you may see the previously set time zone in a timing dependent fashion. That's the nature of the beast. To "solve this" is seemingly to make that other thread capable of time travel or something. It read something before it was written, and acted on it. Reasonable!

The harmful data races are when you read intermediate results. If setting the timezone is a multi-step process, or involves manipulation on complex data structures with pointers that might be deallocated, then you are in grave danger. Seeing a previously valid result is ... I honestly don't know how you'd expect to solve it without threads being able to see the future, or some other unreasonable expectation.

> If you are modifying TZ while another thread is relying on it to calculate time

environ is a single contiguous null-terminated segment of null-terminated key-value pairs; any change of any environment variable might reallocate it, changing the address and invalidating the old address.

Also why it's a bad idea to store the pointer returned by getenv, it might be invalidated by any environment modification.

The strings in environ is only contiguous at program start. In every libc I'm aware of, both putenv and setenv replace only the specified key-value pair (and possibly environ itself, if it needs to be larger) and should not affect the address of any other environment variables. It is still thread-unsafe, but far more limited in its unsafety.
In current glibc master, it's unsafe for any putenv/setenv to race with any getenv, even if the variable names are different, for two reasons. (Note that multiple calls to putenv/setenv are serialized by a lock, but getenv does not take the lock.)

(1) setenv resizes environ using realloc, which frees the old buffer, so getenv can end up reading from a freed array.

(2) The code does not use atomics or memory barriers, so on weakly ordered architectures, getenv could observe another thread's write to one of the pointers in the environ array, or to the environ pointer itself, while observing stale values for the memory behind it.

In both cases, getenv could end up returning a bogus pointer or just crashing.

However, those issues can be fixed without changing the API, and at least Apple's libc seems to do the right thing here. On the other hand, other libcs such as musl, FreeBSD libc, and even OpenBSD libc (!) do worse than glibc and have no locking at all.

If someone could convince the maintainers of all those libcs to add a lock and make getenv/setenv 'thread safe as long as you're not racing on the same variable name', then that would be a good starting point. But in my opinion it would still be a half-measure. We need a fully thread-safe environment.

And honestly, it might be easier to convince the maintainers to add a full solution than a half-measure, even if it involved API changes. (But it may be hard either way. Rich Felker showed up in a Rust thread a while back and was highly negative on the idea of making any changes to musl.)

IMHO - I am sympathetic to the BSDs, Apple (presumably forked BSD), and musl approach.

In what sane world would someone reasonable treat (initial shell) Environment Variables as a proper ACID complaint database? About the 'best' solution I can see for preventing segmentation faults related to resizing the env array during runtime is to defer reclaiming freed memory chunks until after all in-process threads have been given another uninterrupted timeslice to process. Even that wouldn't be 100% but probably would cover any not pathological case.

atomic ordering is very easy if you don't care about performance. So on the other hand we could ask why get/put/setenv have such a terrible need for performance that we can't afford to put a simple lock around them.
The shame of those CVE is that it created a split in the rust community between chrono and time. For a time it looked like people were all moving to time (which handling on TZ is a bit stupid IMO since it just refuses to work if there is more than one thread). But with chrono 0.4 now things are stale and there is no clear winner anymore.

I would argue that those splits are in great part responsible for the feeling that rust is hard to learn. I remember to have had to dig into pretty complex time code to understand why it broke our program that relied on timezone when we switched from chrono to time. It hinders your productivity for sure even if you learn the how.

> Rust declares it safe to modify the environment, while POSIX declares it unsafe.

There's your problem right there, and it ain't the behavior specified in the standard.

But Rust doesn't declare it safe to modify the environment in general. It declares it safe to modify the environment using std::env::set_var, which uses locking internally. The docs explicitly note that there's potential unsafety if non-Rust code modifies the environment:

"Note that while concurrent access to environment variables is safe in Rust, some platforms only expose inherently unsafe non-threadsafe APIs for inspecting the environment. As a result, extra care needs to be taken when auditing calls to unsafe external FFI functions to ensure that any external environment accesses are properly synchronized with accesses in Rust."

https://doc.rust-lang.org/std/env/fn.set_var.html

Ultimately the problem here is with Posix. Rust can only do so much to paper over the pitfalls in the underlying platform.

Although note that if you replace libc with eyra, then the behavior goes from thread-unsafe to "just" a memory leak: https://blog.sunfishcode.online/eyra-does-the-impossible/

There is an issue in the std to name setenv unsafe but that is a breaking change so it's complicated.
One problem is that marking that function as unsafe would unfairly penalize platforms like Windows that don't have this issue. Even if it turns out to be the least-bad compromise solution, it sure would be nice if we could have nice things.
You are right. POSIX specifies one thing, the standard library in Rust and some other libraries specifies something different. 'Safe to use unless there are other threads' is not really something you can or want to encode in a type system.

But libraries and users are caught in the middle.

It is safe to use the Rust standard library interface.
Unless the environment is also touched by a part of the program written in Go, Julia, I don't know... The lock is not shared across languages.
> The lock is not shared across languages.

Which just to be clear: it cannot without changing the standard. There is really nothing anyone can do without a change in the standard.

However, to access any of those languages from rust you need to use unsafe.
There is no safe way to access the environment, even if you mark this API unsafe, what are you going to do?