Hacker News new | ask | show | jobs
by asveikau 2495 days ago
I remember when CVE-2009-1897 was current, because it is an obvious example where no one would expect the null check to work:

    struct sock *sk = tun->sk;
    unsigned int mask = 0;
    if (!tun)
Obviously "sk" is not used until after the null check, but if we read it line by line as we would expect a naive compiler with no optimization to act, the pointer is followed before the null check, and null should produce a crashing program. It would seem that anyone expecting it to work would assume the evaluation of the initial assignment of sk to be lazy at first use, which is a very strange assumption.

Still, I remember in 2009 people writing about that snippet as if it is a surprising result and the compiler did something wrong.

3 comments

The article addresses this.

The reasoning is that, since the pointer has already been dereferenced (and has not been changed), it cannot be NULL. So there is no point in checking it. This logic makes perfect sense except that in the case of the kernel where NULL might actually be a valid pointer. The default selinux module allowed mapping the zero page, converting this bug into a privilege escalation flaw. This was however later corrected by preventing processes running as unconfined_t from being able to map low memory in the kernel.

EDIT: On rereading your comment, I think I realized you might be getting at something a bit different, which is that even if NULL is a valid address, no one in their right mind should be dereferencing it so this code is still illogical from a human perspective (to do a NULL check after derefencing) and there is no good reason to do so. That seems to make sense to me, but I don't have any production C experience.

When NULL corresponds to a valid address, you don't want the check stripped out of this sort of code.
Out of curiosity, what are the expected contents of the zero page area? Is access allowed to it just because it's coming from the kernel instead of a userland process?
That is platform dependent, and doesn’t matter. NULL need not be the ‘all zeroes’ bit pattern (1), and even if it did, the C standard says dereferencing a NULL pointer leads to undefined behavior (https://en.wikipedia.org/wiki/Null_pointer#Null_dereferencin...)

(1) Recent C standards have peddled back a bit on ‘it should be possible to write a confirming C compiler for every CPU ever made’ (for example, IIRC, by fixing a char to be 8 bits), so that might be a thing of the past.

On my ARM Cortex processor it's __StackTop

On a read it's a completely valid address. Write generates a bus fault.

Atmel ATMega parts it's the reset vector. A write does nothing.

Braino. ATMega parts address 0 is the R0 register.
In this case it's because it's in the kernel (why don't they unmap it? There's presumably a reason...) and I've no idea whether it might contain something useful or not.

There are systems, typically older ones, where all addresses are valid, including whichever NULL corresponds to. Real mode x86 is one such system - the bottom of memory there contains the vector table.

As an example: 16 bit x86 puts the interrupt table at linear address 0.

On later x86 you can map that page to whatever you want in kernel mode and it will work. But expect C programs to do weird things for what should be crashing bugs.

The other replies to your post talk about various older or embedded systems, but here's the answer for the typical systems actually affected by that CVE, running 32-bit or 64-bit x86:

If nothing is mapped at 0, the kernel will fault just like userland would. This results in a kernel panic.

However, the kernel and userland share an address space. On a 32-bit x86 system with the default configuration, Linux allocates addresses 0 to 0xc0000000 to userland, and 0xc0000000 to 0xffffffff to the kernel. [1] (Each userland process had its own page table, but the kernel mapped itself into every page table.) This is unavoidable to some extent, because an interrupt or system call switches the system to kernel mode and jumps to a kernel-provided address, but does not automatically swap the page table, so at least the interrupt handler needs to be mapped into every page table. [2] x86-64 is similar, but with the upper half of the address space reserved for the kernel.

So why can't user code mess with the kernel's data? Each entry in the page table has a single privilege level bit. If it's set, both user and kernel code can access the page; if it's clear, only kernel code can access it. [3] At the time, there was no way to make memory accessible from user code but not the kernel, as that was considered unnecessary. Thus, userland couldn't access kernel pointers, but the kernel could directly load/store pointers belonging to the current user process. This was used intentionally when the kernel needed to copy data in or out of the process, but it also meant that if the kernel code accidentally dereferenced a bad pointer, it could end up referring to userland data.

That included the null pointer: accesses to it would succeed if and only if the current user program had previously mapped something at address 0, via mmap() with the MAP_FIXED flag. And that's what the exploit code did.

The page tables are under the kernel's control, so the kernel could make null pointer dereferences unexploitable (resulting in a kernel panic but nothing more) simply by refusing to allow user processes to map memory at address 0 – and in fact Linux already had an setting to do so (mmap_min_addr). But it was an setting rather than simply hardcoded into the kernel, because... well, some real software actually depends on mapping address 0 for silly reasons, mostly pseudo-emulation software like dosemu and wine which directly runs the emulated code in its address space. So not all systems had the setting enabled, and there was also a separate issue where enabling SELinux would cause mmap_min_addr to be ignored. [4]

Years later, Intel added an extension in newer CPUs called SMAP (Supervisor Mode Access Protection), which is simply a flag that makes the kernel fault if it tries to access pages marked as accessible to userland. In other words, the privilege bit now selects between kernel only and user only. Much saner – after all, even with mmap_min_addr blocking exploitation of null pointers, other garbage pointers could still end up pointing to userland, which made it easier to exploit the kernel (though, compared to the situation with null pointers, it's more often a question of "how easy is it to write an exploit" or "how reliable is the exploit" than of exploitable versus unexploitable). The kernel and userland still share an address space, though, so the kernel can just toggle off the flag when it's intentionally accessing userland data.

(Even later, the Meltdown hardware vulnerability triggered the implementation of kernel page-table isolation, but that's another story.)

[1] https://lwn.net/Articles/75174/

[2] https://stackoverflow.com/questions/32598810/does-cr3-change...

[3] https://webcache.googleusercontent.com/search?q=cache:b5g4ss...

[4] https://blog.namei.org/2009/07/18/a-brief-note-on-the-2630-k...

Thanks for the very detailed explanation!

If I understand correctly, the fact that if a page is mapped by a process at address zero allows both userland and kernel code to trigger unexpected code paths, since page access isn't exclusively kernel or userland. The optimizations mentioned in TFA add even more potential for issues, since userland code could control pointers in that zero page to point to arbitrary data in userland that the kernel can read.

This is fascinating, I didn't know it was possible to share pages between userland and the kernel, and always assumed those two were strictly segregated.

Yep. Something I didn't mention is that if you just try to allocate memory without using MAP_FIXED to force a particular address, the kernel will never choose address 0, regardless of the value of mmap_min_addr. That's true even if the entire rest of the address space is filled. Therefore, userland programs can rely on accesses to address 0 causing a fault unless they specifically ask to map it, which makes the compiler optimization in question perfectly reasonable for most of them. After all, a userland program doesn't worry about being exploited by itself.

(There's still potential for unexpected behavior in those userland programs that do map 0, like wine and dosemu. Even if those programs themselves are compiled with -fno-delete-null-pointer-checks – I'm not sure whether they are – they link to system libraries which aren't. Oh well.)

The assignment merely takes the address of tun and adds a small value. That wouldn’t cause a crash.

The surprise is that the if (!tun) check is optimized sway because if tun is NULL the assignment causes undefined behavior which the compiler does not have to take into account.

No, it fetches it - tun->sk is copied. The address calculation case would be something more like struct sock * * p=&tun->sk.