Hacker News new | ask | show | jobs
by vbezhenar 830 days ago
Userspace program directly uses CPU and memory (unless you're using VM). In contrast to that, your userspace program does not directly access your network device or SSD, but uses kernel routines to access those indirectly.
3 comments

It doesn't directly access memory. The addresses in your userspace program are not the actual addresses of memory in the ram sticks - there is a table of mappings that the kernel sets up. When your process asks the kernel for memory, it says "i need 5KB, please put that at address XYZ". The kerenel goes and finds 5KB unused, probably at some other address ABC, and creates a mapping in the table that says XYZ translates to ABC. Then the kernel sets the MMU of the CPU to use the table for your process, and switches back to unpriveleged mode, letting your process run again. Your process in unpriveleged mode sends an instruction to write to the memory at XYZ, but the cpu will translate that instruction to ABC and write there instead.

VMs (well not emulated vms, but if you're doing an x86 vm on x86 or an arm vm on arm) do something similar - an inaccurate (but useful for the concept) way to think of it is that the cpu does 2 layers of MMU for user processes in a vm.

The kernel code isn't running directly when your program accesses memory, but it sets up the cpu so that the kernel still has control over your memory, and you only have access to what the kernel allows - its mediated by the kernel.

I think the point of the diagram is that the abstractions or “API” that user space gets to use includes memory it can read and write directly, and a CPU to execute instructions. Of course in reality there only “appears to be” memory and a CPU, but that’s why it’s an abstraction. Just like there “appears to be” a filesystem for user space to use, when in reality there’s a block interface to a disk, or wherever you want to draw the line.
I think what I was getting at was that memory sort of sits in-between.

My instructions are executed directly on the cpu. My file reads and writes directly translated from a stream of bytes to block instructions by code in the kernel.

Memory is a wierd in-between place, or maybe a 3rd option, since the kernel has to run a bunch of code on my behalf for me to use memory, sort of like the filesystem thing, but I'm using the direct hardware units afterward, sort of like instructions.

CPU is pre-configured by OS as well. So IMO it's the same as memory. May be it would be more appropriate to say that userspace program accesses directly some parts of CPU and RAM.

With modern computers, AFAIK even OS does not have absolutely full control over CPU.

But they don't directly access memory. They access an address that may be translatable by the mmu to a physical location in ram. They may also write to an address that the kernel hasn't allocated a page for yet, but that the kernel has agreed to map into the process' memory. In this case the kernel handles the trap and maps a page of actual ram (etc) and then the process continues forward progress.
That's just the way CPU works. It has nothing to do with kernel or userspace. Kernel code will behave identically.
> With modern computers, AFAIK even OS does not have absolutely full control over CPU.

It's even more amusing with Type 2 hypervisors.

If you're using a hypervisor, then the userspace program inside the VM is also using the CPU and memory directly. You'd have to do full emulation to avoid that.

Even with full emulation, I'd say memory is being accessed directly, unless you really go out of your way to make it weird.

With full emulation, I'd argue memory access is not direct. Memory access from the emulated system will go through user space code in the emulator. That code may translate it to actual memory access, or perhaps an emulated, memory mapped I/O device like a frame buffer. Either way, there is something in the middle.

You could argue that nothing is direct unless you're running on a bare metal system, no MMU, no page tables. How do you define "direct"?

For the definition I used earlier, the bytes don't change shape in between. That makes MMU and interpreted access direct, and compression indirect.
Ring 3 is not "directly using the CPU." And mmap is not "directly using the memory."
Hardware ring has nothing to do with "directly using the CPU", it controls what access level the program has.

Forget virtualisation. Compile a userspace program which just adds numbers into a stack variable. That program is running directly on the CPU in the unprivileged ring.

A userspace program in a VT-x virtual machine is exactly the same.

If those programs attempt privileged access then that access will fail and a trap is raised. That's what the CPU ring controls.

> Hardware ring has nothing to do with "directly using the CPU"

Why wouldn't it? Several features are simply not available in ring 3. Several features are configured for you in a way you cannot change. Several instructions will simply fault your program.

> which just adds numbers into a stack variable

Yes.. and when you eventually overflow that stack, what happens? How did the stack segment selector get created? Can you change that selector or it's attributes? Can you set the stack pointer to any valid memory address you like?

> A userspace program in a VT-x virtual machine is exactly the same.

What does an IOMMU do?

> If those programs attempt privileged access then that access will fail and a trap is raised.

Right.. so you are not directly using the CPU. You're not even in control of what timeslices are afforded to you by the OS. You are in an exceptionally limited environment most of which you cannot control or alter and much of which you cannot even observe.

The fact that instructions get dispatched according to the system ABI when you run a program is not material to this problem, and in particular, is not at all correctly represented by this diagram.

You are directly using the CPU, you just do not have full access to the entire CPU. There is no userspace ALU that your numbers get crunched on, there is no userspace register file your working set is stored in (actually, they might do that internally, but logically there is no such distinction). You are in a hotel room. Just because you can not stomp around in the ducts does not mean you are not directly using the hotel room, you just have limited access to the rest of the hotel.
You don't have adminstrative access to all of Hacker News. Therefore, you are not really on Hacker News. This is your logic.