Hacker News new | ask | show | jobs
by monocasa 2878 days ago
> It’s because args to syscalls are passed in registers rather than the stack. This is a security mechanism I believe, but I’m mostly guessing based on xv6.

It's probably for speed reasons. Marshaling from user space is expensive due to all of the checks you have to make to not allow user to crash kernel.

> Basically, if you want a kernel space and a user space, you have to ensure users can’t breach kernel space. But this is the part where my logic runs dry: could a malicious caller control the return address that’s pushed to the stack? If so, could you redirect the kernel’s execution to an arbitrary physical address? Or does the kernel switch back into user mode just before calling RET?

Return from interrupt uses the special iret instruction. That makes sure that the return happens in a user context if need be by atomically setting the flags and ip registers at the same time.

2 comments

yes exactly this - once upon a time (V6/V7 on the PDP-11) when I was younger sys call parameters were on the stack, I worked for a company that ported Unix to various CPUs/MMUs, we'd knock one out every 6 weeks or so - on some MMUs accessing user space from kernel space (safely) was extremely slow - we discovered that switching syscalls to pass parameters in was a real performance hog, and benchmarking showed that passing in registers was far faster in all systems. Our systems supported both sorts of system calls. When I wrote the original 68k system V ABI I included register passing as the default
I'm learning PDP-11 assembly and would like to play around with some OS stuff. This was very helpful to know. Thanks.
To be utterly pedantic, on Intel Linux only "legacy" 32-bit int 0x80 mechanism uses iret to return.

32-bit "fast" syscalls use sysenter/sysexit. 64-bit "fast" syscalls use syscall/sysret.

Haven't really looked but I suspect sysexit and sysret are somewhat special cased versions of iret.

https://blog.packagecloud.io/eng/2016/04/05/the-definitive-g...