Hacker News new | ask | show | jobs
by peter_d_sherman 894 days ago
Most people (including most Unix greybeards!) really don't understand Unix's 'init' (AKA "the init daemon", "the init process", etc., etc.) -- much less any of its massively-increasing-in-LOC (and complexity!) successor programs...

So we need to start with 'init'.

'init' -- even in its absolute first, simplest incarnation -- is still too complex to understand correctly!

You see, we need to shift perspectives!

We need to shift perspectives from a longtime System Administrator -- to that of a new barebone OS programmer.

What is 'init'?

Is 'init' a program that handles runlevels, starts and stops services, that mounts filesystems, that processes messages, that captures dead processes, that waits for hardware to become available, that logs and maintains informational/database/etc files, that starts audio, that starts X11, that stars the GUI, that acts as a proxy for sockets, or does anything else with the system?

No!

From the point of view of a new barebone OS programmer (as Dennis Richie and Ken Thompson were when they invented Unix and invented 'init') -- 'init' is NONE of these things!

'init' is only THE FIRST PROGRAM, THE FIRST COMPUTER CODE THAT RUNS IN USER SPACE.

And that's it!

That is all that 'init' ever is, or ever was!

(User space, to recap, is the unprotected AKA "unprivileged" AKA "non-supervisor" memory running unprotected (AKA "user-land") code: https://en.wikipedia.org/wiki/User_space_and_kernel_space)

'init' (and every single 'init' successor program, i.e., OpenRC, systemd, etc.) -- are the first program, the first set of computer code OUTSIDE OF KERNEL CODE (which has been running and is currently still running) to be run by the system.

Now, what should that first program do?

See, that's the magic question -- which gives rise to all that is to follow!

In theory you could have an OS where the 'init' program, or its equivalent -- did absolutely nothing! But that wouldn't be very productive!

If the 'init' program isn't itself a shell program (i.e., sh, bash, etc.) -- then (because there's no GUI at this point) the computer will not be able to accept typed command-line commands -- which is the first thing that you want a new OS to do!

So now our 'init' expands in scope (and lines of code)!

Our 'init' could be hardcoded to launch 'sh' or 'bash' (or whatever shell program exists) -- but what if the user wants to change that?

OK, so now we need our first configuration file. Where to put that exactly?

Oh, it's on a filesystem that hasn't been mounted yet?

Well, maybe init should mount that filesystem!

Point is, there's a set of problems (and sub-problems!) -- which give rise to increasing and increasing init's functionality over time!

init, as the first user-space program for an OS to run, on whatever OS it is ran on, in whatever form it is in -- could simply be written to run and 'outsource' all of its functionality to other programs...

But init (as it evolved into its very large LOC complex descendants) -- became a "dumping ground" -- for functionality that was inconvenient to go in other places and/or to be outsourced to other programs.

See, all of the code in all userland Linux utilities -- could in theory be grafted together into one big super program in userspace.

It would have the same functionality as all of the individual Unix/Linux command-line programs put together (and maybe that would be desirable to some people). But from a Software Engineering "separation of concerns" AKA dependency reduction AKA modularity AKA "do one thing and do it right" AKA loose-coupling perspective -- doing that might not be so desirable!

And yet, with the complexity brought about by 'init' descendants -- it seems like we're going down that exact route!

Which leads us full circle (because history always repeats itself!) -- back to the reason why Unix was created -- because of the complexity and problems brought about by the complexity of its predecessor, Multics!

https://en.wikipedia.org/wiki/Multics

Point is -- 'init' in whatever form it takes -- is by no means obligated to do anything -- although if it is to do nothing, then it should at least launch one other program which will do something! If that's the case, then why not put that under user control? But wait, if we're doing that, why not make it launch multiple other programs! OK, now we need a file to tell init where that should be! But what if the filesystem for that file is not mounted?

Anyway, you see how the "rabbit hole" of problems (and increasing LOC complexity) forms!

Related: https://www.joelonsoftware.com/2002/11/11/the-law-of-leaky-a...

2 comments

> That is all that 'init' ever is, or ever was!

Except it is not. The "init" process seen by users as PID 1 on the usual Linux distributions is far from the first program started by the kernel. There are usually hundreds of processes started before that PID 1 process gets started.

>There are usually hundreds of processes started before that PID 1 process gets started.

By the time PID 1 is started, the OS has run OS code, or more specifically, prior to PID 1 being ran, the OS has run OS code -- that much is true!

Loosely, you are calling that OS code -- "processes"

We could call that OS code "processes" -- however, there is a difference between this loose definition of what a "process" is, and what a "process" is defined to be by Unix/Linux standards and conventions.

The OS code that the OS runs, the kernel code -- typically does not have a 'PID' (Process ID) associated with any of it.

To understand this, let's remember what Unix/Linux (or any OS) is -- it's software abstraction layer over hardware.

It's goal is to act as a hardware abstraction layer, but also as a moderator, a resource arbiter for other pieces of software, other code -- to run on the same hardware, the same machine, at the same time.

This other code -- is typically the user space (AKA non-privileged) software, the user-space code.

In order to manage that user-space code, AKA user programs, AKA user-installed software and because several (or more) of these programs might be and probably will be running simultaneously, a unique postive integer handle (functionally the equivalent of a key in key-value programming).

That unique positive integer handle assigned to each user-space program (unique ones can also be assigned to threads of programs) -- is typically the PID, or Process IDentifier.

That's WHY they exist.

PID 1 is the Process Identifier for the first user-space program to be run by the operating system, typically 'init' -- or a descendant.

If "There are usually hundreds of processes started before that PID 1 process gets started", which is what you're calling "processes" (kindly disambiguate exactly what you mean when you say "process" please!), then WHY is it that they are not given ascending unique positive integer numbers and WHY does this numbering, should it exist, force PID 1 to be some other higher-numbered PID?

If "hundreds of [user-space] processes [are] started before that PID 1 process gets started", then WHY exactly, is the first ran user-space program PID 1 -- and not some higher numbered PID?

?

???

What Lennart found is that pid 1 is the only process in the whole userland that can ensure a process is terminated without risk of race conditions. Hence, all the process management stuff HAS to be in pid 1 in order to provide consistency guarantees. Hence, systemd.
PID 1 as the first user-space program/process (historically the 'init' program), has always been "the process management process".

In other words, it has historically been the program/process used to manage other programs/processes.

Consider the simplest possible Unix/Linux system with only two user-space processes. For a Kiosk, let's say.

The first user-space process/program is 'init'.

The second (and last!) user-space process/program is the kiosk software itself. Call it KIOSK.EXE (or /usr/bin/kiosk -- since this is Unix/Linux, or what-have-you)

OK, so now here's the problem!

What should happen if KIOSK.EXE or /usr/bin/kiosk -- should fail for some unexpected reason?

Some stack-heap collision, some segmentation fault, some core dump, some "blue screen of death", some abort, some failure...

OK, so now how do we:

a) Monitor that kiosk software to know if its up and running;

b) Bring it back up and running if it has crashed?

Well, if there's only 'init', aka the PID 1 program/process in the system (other than the kiosk software, and the kiosk software is crashed, no longer functional) -- then it must be 'init' (or PID 1's) responsibility to bring 'kiosk' back up!

To get 'kiosk' back up and running!

That's WHY the "process management stuff" functionality -- historically has been placed into 'init' (PID 1) -- simply because there was no other easy/simple/straightforward/obvious/user-friendly place to put it!

But 'init' ultimately is just a user-space program.

It could be programmed to do nothing, it could be programmed to do everything (its code could in theory be fused with the kiosk program, although then it becomes the OS's responsibility to restart that program if it crashes!)

But ultimately 'init' -- is just a program!

It could, if programmed to, do whatever any other user-space program could do.

It could also be programmed not to do anything -- or offer the barest minimal functionality.

What 'init' does or does not do, or is supposed to do or not supposed to do -- is truly at the behest of any programmer who wish to work with the source code of 'init', in whatever form that may take...

Nothing "HAS to" be any particular way for any OS or user program if you have the source code, if you know what it does, and if you know what you are doing...

There are no "HAS to's" -- for those that have source code and understand it.