Hacker News new | ask | show | jobs
by billti 1253 days ago
Amongst the first sentences...

> It may be thought of as what happens when a whole computer starts, since the CPU is the center of the computer and the place where the action begins.

I thought that too. Last year I spent a while getting as low-level as I could and trying to understand how to write a boot loader, a kernel, learn about clocks and pins and interrupts, etc. I thought, "I know, I'll get a Raspberry Pi! That way even if I brick it I didn't waste too much money".

Turns out the Raspberry Pi (and I'm guessing many other systems) are pretty confusing to understand at boot time. For one, it's the GPU that actually does the initial boot process, and much of that is hard to find good info on. (https://raspberrypi.stackexchange.com/questions/14862/why-do...)

I spent many many hours reading various specs & docs and watching tons of low-level YouTube videos. Compared to software development higher up the stack (my usual area), I found the material surprisingly sparse and poor for the most part. (Maybe that's reflective of the size of the audience and the value in producing it).

20 comments

Yeah modern hardware is crazy hard to understand and a lot is proprietary/trademark stuff like you saw on the rpi. Arduino is a good start but it's just so low level that it doesn't get you much further to understanding how a modern system works.

One slightly weird but fascinating path I have been playing with is writing programs for old game consoles, particularly the Nintendo DS. You can get full and comprehensive hardware documentation for these consoles and there are easy ways to run your own code on them now as well as libraries/tooling around it. But they run no OS, your program runs directly on the hardware, so you get a good feel for low level programming while not being down to the level of atmel chips.

It can be a little hard to work out how to get started but it's really as simple as setting up `libnds` from devkitpro and then either hacking a dsi to run the twilight firmware, or buying a cheap flashcard from ebay to run your own programs. Read the example programs from the devkitpro github and some posts on the hardware and you'll get the hang of it.

writing programs for old game consoles,

Oh yeah, like the Intellivision.

particularly the Nintendo DS.

Oh come on, the DS ain't that old.

Never done anything but I think the gameboy family is a good choice. Unlike NES or SNES which use assembly heavily, we can use C for the GBA. There are also great communities devoted for them.
The nes is a simple but extremely elegant machine. Assembly isn't a problem, since the cpu is very simple, I recommend playing with it.
Definitely. I guess one can climb the ladder by first programming the NES and then SNES and just go up, maybe even learn some Japanese to immerse in the life of a Japanese game developer in the 80s/90s. Could be a long but interesting side project.
You can, but it’s pretty lacking in registers.
It's still not a PIC12.
20 years next year.
I haven't even reached 30, yet that comment of yours made me feel old.
It's probably more confusing since the DS had several iterations, first one in 2004, then the lite in 2006, DSi 2008, DSi XL 2009. And then the 3DS having mostly the same form factor probably made the design feel new for a lot longer.
It did launch in 04/05, but it's only two console generations older than a switch.
Considering the switch is at the end of its lifespan, I’d say that’s pretty old. It’s like saying 7 years into the PS3 lifespan that the PS1 isn’t old. It’s old
> But they run no OS, your program runs directly on the hardware

How is threading/time scheduling/interrupt handling/context switching usually done?

I dunno about the DS, which doesn't really qualify as that old to me; it's got a pretty decent sized rom for when you don't have a cartridge in, and I'd guess the SDK gives you something approaching an OS, maybe even with a threading library.

But on real old hardware, you're not going to run threads or a scheduler, you're going to run one iteration of your game loop, then wait for a sign that it's time to do your graphics work (after VBlank, during the screen scan on Atari 2600, mostly during VBlank on more generous platforms). If the platform doesn't have many interrupts, it probably has a fixed address for the interrupt vectors and your rom would cover that address/those addresses; and there's probably a fixed address that execution starts at too or it's one of the elements of the interrupt vector table.

Context switching isn't really that hard anyway --- call into (or get interrupted into) a routine that saves registers to the stack, then saves the stack pointer for the current task, restores the pointer for another task, pops the registers from the stack and returns.

There's not usually any sort of memory protection between tasks in a game, but it's assumed you kmow what you're doing.

One thing I've always wondered about is what if you get interrupted again while you are halfway done saving registers to the stack? Like I heard something about disabling interrupts during sensitive operations like that, but wouldn't that then risk missing an event entirely instead?
There's a flag set in hardware when /INT is asserted. At this point you can smack /INT around all you want, you're not getting any more interrupts firing.

Once you're done you clear the flag, either explicitly in the CPU's "condition code register" in some chips or by using a specific "Return from Interrupt" opcode that works like a "normal" subroutine return but clears the flag.

We use the analogy of a doorbell to describe interrupts a lot. In truth it's like if your doorbell could only be rung once, until you open and shut the door.

One approach is to disable interrupts during these sensitive tasks yes.

Once you enable interrupts again, the interrupt controller will trigger an interrupt if one occured while it was disabled, so it works out. All systems work differently though, some can't queue interrupts, meaning if 2 or more interrupts occured while disabled, you will lose those interrupts. And if interrupts can be queued, there's a finite length of the queue.

Another approach is that the interrupt handler will save the current state of the world and restore it again afterwards - meaning if you're halfway through saving registers to the stack, you just continue as nothing happened, and save the rest of the registers. Note that you can't be interrupted in the middle of a single instruction.

Ofcourse that puts the problem one level deeper, what if an interrupt handler gets interrupted by another interrupt while it's saving the state of the world. On some system this can nest (up to a point where things just breaks badly), on others or depending on the interrupt type, you need to disable all or certain other interrupts again.

This works out when you get all your code/ducks in a row, which is a lot of hard work, sweat, reading highly low level and technical documentation, sometimes trial and error as devices/documentation may be lying, buggy or absent.

Most of us sit on top of an OS kernel where all this is thought about and handled over many years or decades, or can use an existing kernel or libary even if working on a more bare-bones and simpler systems and embedded systems, and we should thank the people that makes all of this work out.

Well said, one thing to add is it's not uncommon to have the CPU disable further interrupts as part of the automatic handling. Or it may be configurable, it's often not something the interrupt service routine needs to explicitly do; although, sometimes it is, there's so many options.

If you're on a system where interrupts are disabled automatically, and you actually do want re-entrancy sometimes, you can usually make that happen too, but you might wait until you get to a safer place (maybe switched to a kernel stack or ??? again, so many options)

If you are interrupted by another higher priority handler, that handler ALSO saves and restores registers, so whatever was interrupted doesn't even know. the registers after interruption should have the same vaalues as before interruption, so whatever was saving, is saving properly. Typically the same interrupt is not called again while already servicing a signal. Interrupts are handled by flags, so when there are several flags raised at once, interrupt handlers of the same priiority run one after another and each one of them should clear it's own flag (sometimes it's done automatically by processor), often just at the start of routine. If then another interrupt sets the flag before routine finishes, routine will restart after finishing.
When interrupts are disabled events that occur during execution are usually buffered to the next possible moment.
> In a typical Windows 10 installation with many background processes and services, the CPU context switching rate can vary greatly depending on the specific system's hardware configuration, running processes, and workload. However, on a typical system, the context switching rate can be anywhere from a few hundred to a few thousand times per second.

Would you agree with this statement from ChatGPT? Is the Windows kernel handling thousands of context switches and time slicing processes the way you described? pushad/pushfd + popad/popfd

Yeah, more or less; mostly more. pusha/pushad is one of those instructions that sounded good, but isn't used much (it became invalid in amd64), windows will push the registers one at a time, and maybe FPU, MMX, SSE, etc registers; of course, that's a lot of extra pushing, so there's strategies to avoid it if the thread doesn't use them. If you switch to a different task, you're going to need to load its page tables, and these days you've gotta flush a bunch of caches to avoid Spectre (although you shouldn't avoid the Spectre game from the 90s, that was nifty).

If you're good at Windows, you can probably get a count of context switches per second on your system, with your load. Context switches generally includes interrupts as well as calls into the kernel from userspace. A server work load is going to go up to hundreds of thousands, maybe millions per second, again depending on your load.

This seems wildly inefficient. Can’t we have multiple sets of registers? Not millions, but…

Are registers expensive in hardware? Why not have loads of them?

> windows will push the registers one at a time

Wouldn't Windows (and Linux) use the FXSAVE instruction instead?

Please stop using ChatGPT to write your comments. Nobody here is here to have a conversation with ChatGPT, and anyone who wants to talk with ChatGPT instead of actual human beings can do that privately without polluting the conversations of real human beings.
I think it would only be problematic if he was pretending that what ChatGPT said is what he said. Instead he asked a question to verify if what he found is true (same way you would do with other online sources).
No threads. Well, there’s two CPUs so you can count that as two threads. They run completely independently, operating on two different codes with different instruction sets. (ARM9 and ARM7TDMI)

I don’t know what you mean exactly by time scheduling. There are several timers that you can configure and you can set them to raise an interrupt when they finish. They can restart automatically.

There is an interrupt vector for both software and hardware interrupts. Software ones are raised by the swi assembly instruction. Hardware ones are raised by the aforementioned timers but also the display (vertical and horizontal blank), the sound system, wifi, etc. They can be enabled/disabled by setting a specific bit in a specific memory location (IE interrupt enable). Your interrupt handler is supposed to restore registers and clear the interrupt flag at the end.

Context switching is done manually.

I think libnds has an implementation of software threads. I don’t know how they work.

I'm still digging in to the details so I can't answer perfectly. But as far as I can tell. There is no threading or processes, it's more like a microcontroller. I don't think there are any interrupts for things like inputs, you have a main loop and you are expected to poll the inputs frequently. I suspect there are timers which can interrupt like you'd get on a microcontroller though I haven't tried this yet.
Just looked in eBay and had no idea there were so many DS systems available.
The console sold 154 million units. And since it’s so old, they are all finding they way to eBay. It’s honestly more interesting to play with and easier to buy than a raspberry pi.
wondering if they meant variants
As early as the mid-70s, minicomputers started including microcontrollers to manage the boot environment. I had a Sun machine once that included what they called Lights Out Management. A little microcontroller that had control over the system power and etc. Always available via its dedicated serial port, even when the machine was shut off. Everything is like that now. A smartphone will have multiple processors. Some doing IO interfacing. One to manage the battery. The radio hardware will have a general-purpose processor.

Most of these processors are fixed-in-ROM sort of machines, so the story for booting them individually is pretty simple. Much like a late 20th century PC, when switched on (either by the power supply or by another processor), start running BIOS code from the hardwired start address. Some need to have more software transferred to them after that.

Modern machines are really networks of computers in themselves. Networking and bringing all these parts together to support the main processors, at the low level, is not only poorly or completely undocumented, but it's probably impossible for one person to fit it all in their head these days.

True. It's like what's happened with electronics, radio, video in that way. A solid grasp of the (always-present) basic components, and basic tech and design philosophy, goes a long way. It's like learning a language.

So easily-understandable articles like this are essential for beginners, along with a short list of masterful books ( e.g. those by Forrest Mims, for electronics) and playing with physical components! The rest is the endless variations, but they're all speaking that language, 'cuz the laws don't change.

A recurring question I have is: how many microcontrollers/CPUs are in a modern personal computer? There are clearly a lot, but just how many?
- the embedded controller (EC)

- the CPU core in the chipset that runs the ME/PSP

- if the TPM is not an fTPM, it has a CPU I'm sure.

- if your NIC has offload engines, it has a CPU or two.

- each storage device has a CPU.

- each Wi-Fi device has a CPU.

- thunderbolt controller takes firmware, it has a CPU. I'd bet USB3 and 4 do too.

- any USB device has a CPU on the other end accepting and interpreting commands.

- same with any SCSI device.

- monitors have a CPU or two, one for the OSD settings and another to drive the display I'm sure.

- I think any nVidia or AMD graphics card has a CPU in there (in addition to the GPU).

- The following portable media has microcontroller firmware: SD cards, Memory Stick. (The now defunct SmartMedia and Olympus XD were raw NAND).

- your optical mouse has a CPU as well to process optical data.

- obviously printers have CPUs and firmware, probably separate ones for the web UI and the part that drives the actual print mechanism, and I'd bet a separate one for scanning and image processing.

- any keyboard has a microcontroller and firmware (there are open source keyboard firmwares)

- SIM cards for cellular connections have their own CPU and firmware.

Apple laptop chargers had a 16bit MSP430 microcontroller in them, the same performance as original Macintosh.

Apple Thunderbolt Cables have a ARM chip on each end of the connector.

Every Intel CPU has atleast one smaller x86 core that isn't visible to the user that is running MINIX.This CPU is responsible for MicroCode updates.

> obviously printers have CPUs and firmware

Famously Apple's first laser printer had a faster 68000 than the Mac it connected to.

> Memory controllers contain the logic necessary to read and write to DRAM, and to "refresh" the DRAM. Without constant refreshes, DRAM will lose the data written to it as the capacitors leak their charge within a fraction of a second (not more than 64 milliseconds according to JEDEC standards).

Do they use CPUs now?

I did hear that IBM was developing serial RAM (not NVRAM) with an onboard controller on the memory modules - with the need for firmware. Beyond that I didn't think memory controllers ran instructions from ROM or other instruction storage like a CPU.

As far as flash memory, they definitely use a CPU and that's what I meant by "storage" in my list. :)

If you want something you can almost fully understand, I'd go another step or two lower, to a microcontroller like an AVR or one of the old Motorola chips (68HC11 or something). These chips were actually designed by hand and expected to be programmed by hand, and their documentation reflects this. They also ave much less hidden microarchitectural state than a modern CPU.

Once you're familiar with that move to a more modern microcontroller like an Arm Cortex-M0, and after that maybe something with off-chip memory, a MMU, etc.

Those 8-bit controllers with (exclusively) on-chip SRAM and Flash are indeed quite simple - they can be summed up as "Jump to 0x0000", but really are surprisingly complex once you get into bootloaders and interrupt service routines and so on.

The proprietary blobs and GPU on the Raspberry Pi make basically impossible to have a full understanding of what's happening. Instead, I'd recommend learning with the TI Beaglebone Black, which has an ARM Cortex-A8 with an MMU and lots of open documentation.

Look at the IBM PC/XT/AT if you want something more documented. IBM supplied schematics and even BIOS listings for those machines, the newest being a 286 (AT). Thanks to doing some BIOS RE work many years ago, the address F000:E05B still remains in my mind...
> For one, it's the GPU that actually does the initial boot process, and much of that is hard to find good info on. (https://raspberrypi.stackexchange.com/questions/14862/why-do...)

x86 is similar, since Intel ME (part of the PCH, whether in the chip or not) is needed to boot the CPU.

The PMC also needs to be online before the main cores. And many machines there's still an external EC on the board responsible for sequencing power state external to the chip. The x86 application cores actually start up quite late in the process, long after the SPI has been read out, memory and cache controllers initialized, etc...
Yup, in bigger systems you also often need to initialize/train DRAM which means your first code essentially runs off L2/L3 cache

Here is a presentation about open firmware with a lot of stuff about boot process: https://www.youtube.com/watch?v=fTLsS_QZ8us

I think OpenPower has a fully open bootloader. But it's not exactly cheap to try.

Those systems startup into "Cache Contained" mode. Where the Boot ROM is copied to CPU Cache and there's no main memory yet. The code in the ROM has to initialized main memory before it can use it.

This is very common in embedded systems, and honestly I'm not surprised to see this, even with projects like Raspberry Pi... although, I do think it's a big shortcoming for a project like that.

In a "real project" you would be relying on your suppliers and other teams / colleagues for many of the "hard questions". Chances are you'd have a contact with the silica manufacturer, just as a major example.

There would be a boot team, or sometimes just one boot developer. If anything goes wrong, or you want to change anything, you would be pretty helpless without them. You can go to the various specs and code, but this can be quite in depth if you're starting from scratch.

Change project or micro? Chances are that all goes out the window, and you have to start from scratch.

Would you mind sharing some of your favorite youtube videos that you've come across?
Chiming in with everyone else - RPi is way too complex.

But another suggestion if you want a modern alternative to understand - RISC-V has an open boot ecosystem. You can just try it in QEmu and maybe buy a board if you get more advanced?

> Turns out the Raspberry Pi (and I'm guessing many other systems) are pretty confusing to understand at boot time.

That is because how ARM ecosystem functions. There is no standard way of integrating ARM CPU into a product as Arm, the company, just sells base IP and not the complete CPU. Every ARM licensee, from Qualcomm to Apple to Nvidia are free to design their own extensions and integrations into their SoC. There is no standard for this. This creates a lot of problems for writing a generic tutorial that you see in the x86 world.

Maybe a good place to start is an old video game console. Those have a lot of community info because of interest in retro gaming/emulation, plus they're simpler than modern hardware.
Might want to look at MicroPython. The ports have various init functions for various chips /system. It might not be a full OS, but it is pretty low level code you can compare to various other systems.

https://github.com/micropython/micropython/tree/master/ports

Check out the LibreRPi project, they are aiming to reverse engineer, document and replace the GPU firmware boot blob and other blobs.

https://github.com/librerpi/

Macs with a T2 chip first start execution there before pulling the Intel CPU out of reset.
This keynote about hardware vs operating systems knowledge of hardware was enlightening:

https://www.youtube.com/watch?v=36myc8wQhLo

You could start with a simple microcontroller like an AVR. Pick up an Arduino Uno or a Mega and an ICSP programmer, then write your own bootloader for it.
RPi's unfortunately have lousy documentation for the low level stuff.

Just about anything else (including dodgy Chinese substitutes) is better, sadly.

the "whole computer" is also thought to start on the memory, because you can have memory on a computer and a manual operator of it, but you can't have a computer with an operator of it and no memory.
Yeah better off just getting QEMU to boot. Target 286 or something.