Hacker News new | ask | show | jobs
by dfc 831 days ago
My mental model of Linux does not have the CPU/Memory in user space. What am I missing?
2 comments

Nor does mine.

Userspace assembly runs directly on the CPU* executing in the unprivileged ring. When the userspace program makes a system call by calling a kernel entry function which is mapped into the process's address space by the dynamic loader, then part of that entry into kernelspace is to put the CPU into the privileged ring and kernel assembly then runs on the CPU.

The process scheduler can stop execution to kick a task off the CPU and switch to another one, depending on OS and kernel some things can be kicked off the CPU and some cannot.

Userspace memory allocations are serviced by virtual memory where the page tables track the translation of virtual memory pages into physical memory pages using the MMU.

The kernel is involved during allocation and page fault, but iiuc a regular successful virtual memory access is a hardware operation only.

I don't have a diagram of how this works. Neither processes nor memory are my usual area of kernel.

You'd be better to read the x86 version of the XV6 book to learn how this stuff really works. It's really well written and implements enough to be tangibly useful. Reading the code is optional when just learning concepts. Reading the XV6 code will hopefully help you understand the O'Reilly Linux books better, which will hopefully help you understand the actual Linux kernel better.

(*yes I'm aware CPUs don't directly execute assembly anymore, but the microcode guarantees the observable CPU state at any Instruction Pointer matches the expectation as if you were running assembly on a PDP or C64, or close enough for 99.999% purposes and definitely enough for debugging your program in gdb)

Do you happen to have any other suggestions about reading?

I am currently reading "Asynchronous Programming in Rust: Learn asynchronous programming by building working examples of futures, green threads, and runtimes" and there is vague talk about how cpus process things, but it really would like to know more. I am even curious about what is actually happening in hardware. It seems hard to determine where to start.

I don't have formal education in this field unfortunately.

> execute assembly

I always thought that assembly was a type of programming language.

It is. From my understanding, CPUs execute machine code. Assembly has to be passed through an assembler to get machine code, and that assembler can make other changes as well, so they are not always one to one. Written assembly will usually translate veryclosely to machine code, though.
Ben Eater's excellent educational video series includes one explaining the difference:

https://www.youtube.com/watch?v=oO8_2JJV0B4

In short, assembly for a given CPU is very nearly one-to-one with the machine language for that CPU. It's not correct to conflate the two, but close enough when speaking informally.

While it's true that assembler doesn't quite fit into the same class as compiled/interpreted languages, it would be disingenuous to say it's not a programming language. It's simply a very low-level, machine specific one.

It's even blurrier when you consider that most modern assembler dialects have plenty of high level functionality (structs, macros, labels, etc) that do not correlate to machine instructions.

The GP has a slightly weird language, but mixing assembly and machine code in informal speech isn't rare at all.
Here I'm referring to assembly as mnemonic for machine code, but yes it would have been more correct to say machine code instead.
Userspace program directly uses CPU and memory (unless you're using VM). In contrast to that, your userspace program does not directly access your network device or SSD, but uses kernel routines to access those indirectly.
It doesn't directly access memory. The addresses in your userspace program are not the actual addresses of memory in the ram sticks - there is a table of mappings that the kernel sets up. When your process asks the kernel for memory, it says "i need 5KB, please put that at address XYZ". The kerenel goes and finds 5KB unused, probably at some other address ABC, and creates a mapping in the table that says XYZ translates to ABC. Then the kernel sets the MMU of the CPU to use the table for your process, and switches back to unpriveleged mode, letting your process run again. Your process in unpriveleged mode sends an instruction to write to the memory at XYZ, but the cpu will translate that instruction to ABC and write there instead.

VMs (well not emulated vms, but if you're doing an x86 vm on x86 or an arm vm on arm) do something similar - an inaccurate (but useful for the concept) way to think of it is that the cpu does 2 layers of MMU for user processes in a vm.

The kernel code isn't running directly when your program accesses memory, but it sets up the cpu so that the kernel still has control over your memory, and you only have access to what the kernel allows - its mediated by the kernel.

I think the point of the diagram is that the abstractions or “API” that user space gets to use includes memory it can read and write directly, and a CPU to execute instructions. Of course in reality there only “appears to be” memory and a CPU, but that’s why it’s an abstraction. Just like there “appears to be” a filesystem for user space to use, when in reality there’s a block interface to a disk, or wherever you want to draw the line.
I think what I was getting at was that memory sort of sits in-between.

My instructions are executed directly on the cpu. My file reads and writes directly translated from a stream of bytes to block instructions by code in the kernel.

Memory is a wierd in-between place, or maybe a 3rd option, since the kernel has to run a bunch of code on my behalf for me to use memory, sort of like the filesystem thing, but I'm using the direct hardware units afterward, sort of like instructions.

CPU is pre-configured by OS as well. So IMO it's the same as memory. May be it would be more appropriate to say that userspace program accesses directly some parts of CPU and RAM.

With modern computers, AFAIK even OS does not have absolutely full control over CPU.

But they don't directly access memory. They access an address that may be translatable by the mmu to a physical location in ram. They may also write to an address that the kernel hasn't allocated a page for yet, but that the kernel has agreed to map into the process' memory. In this case the kernel handles the trap and maps a page of actual ram (etc) and then the process continues forward progress.
> With modern computers, AFAIK even OS does not have absolutely full control over CPU.

It's even more amusing with Type 2 hypervisors.

If you're using a hypervisor, then the userspace program inside the VM is also using the CPU and memory directly. You'd have to do full emulation to avoid that.

Even with full emulation, I'd say memory is being accessed directly, unless you really go out of your way to make it weird.

With full emulation, I'd argue memory access is not direct. Memory access from the emulated system will go through user space code in the emulator. That code may translate it to actual memory access, or perhaps an emulated, memory mapped I/O device like a frame buffer. Either way, there is something in the middle.

You could argue that nothing is direct unless you're running on a bare metal system, no MMU, no page tables. How do you define "direct"?

For the definition I used earlier, the bytes don't change shape in between. That makes MMU and interpreted access direct, and compression indirect.
Ring 3 is not "directly using the CPU." And mmap is not "directly using the memory."
Hardware ring has nothing to do with "directly using the CPU", it controls what access level the program has.

Forget virtualisation. Compile a userspace program which just adds numbers into a stack variable. That program is running directly on the CPU in the unprivileged ring.

A userspace program in a VT-x virtual machine is exactly the same.

If those programs attempt privileged access then that access will fail and a trap is raised. That's what the CPU ring controls.

> Hardware ring has nothing to do with "directly using the CPU"

Why wouldn't it? Several features are simply not available in ring 3. Several features are configured for you in a way you cannot change. Several instructions will simply fault your program.

> which just adds numbers into a stack variable

Yes.. and when you eventually overflow that stack, what happens? How did the stack segment selector get created? Can you change that selector or it's attributes? Can you set the stack pointer to any valid memory address you like?

> A userspace program in a VT-x virtual machine is exactly the same.

What does an IOMMU do?

> If those programs attempt privileged access then that access will fail and a trap is raised.

Right.. so you are not directly using the CPU. You're not even in control of what timeslices are afforded to you by the OS. You are in an exceptionally limited environment most of which you cannot control or alter and much of which you cannot even observe.

The fact that instructions get dispatched according to the system ABI when you run a program is not material to this problem, and in particular, is not at all correctly represented by this diagram.

You are directly using the CPU, you just do not have full access to the entire CPU. There is no userspace ALU that your numbers get crunched on, there is no userspace register file your working set is stored in (actually, they might do that internally, but logically there is no such distinction). You are in a hotel room. Just because you can not stomp around in the ducts does not mean you are not directly using the hotel room, you just have limited access to the rest of the hotel.
You don't have adminstrative access to all of Hacker News. Therefore, you are not really on Hacker News. This is your logic.