| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shawn 2878 days ago

It’s because args to syscalls are passed in registers rather than the stack. This is a security mechanism I believe, but I’m mostly guessing based on xv6.

Basically, if you want a kernel space and a user space, you have to ensure users can’t breach kernel space. But this is the part where my logic runs dry: could a malicious caller control the return address that’s pushed to the stack? If so, could you redirect the kernel’s execution to an arbitrary physical address? Or does the kernel switch back into user mode just before calling RET?

Sigh... time to re-read xv6. I think interrupts are involved.

3 comments

monocasa 2878 days ago

> It’s because args to syscalls are passed in registers rather than the stack. This is a security mechanism I believe, but I’m mostly guessing based on xv6.

It's probably for speed reasons. Marshaling from user space is expensive due to all of the checks you have to make to not allow user to crash kernel.

> Basically, if you want a kernel space and a user space, you have to ensure users can’t breach kernel space. But this is the part where my logic runs dry: could a malicious caller control the return address that’s pushed to the stack? If so, could you redirect the kernel’s execution to an arbitrary physical address? Or does the kernel switch back into user mode just before calling RET?

Return from interrupt uses the special iret instruction. That makes sure that the return happens in a user context if need be by atomically setting the flags and ip registers at the same time.

link

Taniwha 2878 days ago

yes exactly this - once upon a time (V6/V7 on the PDP-11) when I was younger sys call parameters were on the stack, I worked for a company that ported Unix to various CPUs/MMUs, we'd knock one out every 6 weeks or so - on some MMUs accessing user space from kernel space (safely) was extremely slow - we discovered that switching syscalls to pass parameters in was a real performance hog, and benchmarking showed that passing in registers was far faster in all systems. Our systems supported both sorts of system calls. When I wrote the original 68k system V ABI I included register passing as the default

link

cptnapalm 2877 days ago

I'm learning PDP-11 assembly and would like to play around with some OS stuff. This was very helpful to know. Thanks.

link

nineteen999 2877 days ago

To be utterly pedantic, on Intel Linux only "legacy" 32-bit int 0x80 mechanism uses iret to return.

32-bit "fast" syscalls use sysenter/sysexit. 64-bit "fast" syscalls use syscall/sysret.

Haven't really looked but I suspect sysexit and sysret are somewhat special cased versions of iret.

https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g...

link

barco 2878 days ago

In x86 and x86_64 it doesn't matter whether the caller has control or not over the return address, because there's a change of privilege when the kernel returns from the interrupt (IRET instruction). So at that point it would be equivalent for the userspace app to just jump to whatever address it wants.

The caller does not have control over the return address. When int n or syscall instructions are executed, it's the processor who pushes the current context onto the kernel stack (pointed by ss0:esp0), so when you run iret, everything will go back to normal.

Even if the caller had control over this return address, the CR3 does not change [without taking KPTI into consideration], so the memory mappings will still be the same, and everything would be handled with paging enabled, so there's no "arbitrary physical address". You would only be allowed to jump to anything that you have already mapped, and given that there's a privilege change, you would only be able to access userspace memory.

This has nothing to do with whether the syscall parameters are passed down the stack or not. In x86 and x86_64, when you make a syscall and the kernel handles it, the stacks change, so if you were to pass parameters via the stack, you would need to be able to access the userspace stack from the kernel and it sounds like a mess (but possible). The registers, on the other hand, are available for the syscall handler to use, so it's easier to just set the parameters there.

link

userbinator 2878 days ago

It depends on the exact architecture; on x86 the exact logic of the INT instruction is quite involved (see e.g. https://x86.puri.sm/html/file_module_x86_id_142.html for details) but when changing privilege levels, the CPU automatically switches stacks too.

link

shawn 2878 days ago

Is there a nice xv6 equivalent for x86_64? How do MIT students learn about 64-bit arch?

https://aaronbloomfield.github.io/pdr/book/x86-64bit-ccc-cha...

This was good, but it leaves a lot out. No mention of kernel space.

link

sigjuice 2878 days ago

You will probably find an x86_64 port of xv6 on GitHub. IMHO, there is nothing terribly special about x86_64. The goal of xv6 is not to teach 64-bit computers, but to cover operating system basics (primarily multitasking, virtual memory and filesystems).

link