Hacker News new | ask | show | jobs
by jjnoakes 3446 days ago
Most of the statements aren't really conducive to rebuttals because they are lacking substance.

But I can imagine what xenadu02 might have meant, if you like, and provide some counter arguments.

Signals aren't "garbage" (whatever that means).

Signals can call APIs (the set of async-signal-safe APIs). They can't call non-async-signal-safe APIs not because of threads, but because signals can interrupt a routine at any point (necessary for asynchronous notification of certain events which must be handled before the normal instruction control flow can be resumed) and that interrupted routine may not have been written to be reentrant.

This is true even without threads in the picture.

The fork/exec model is not "garbage". It is actually a fairly nice alternative to the "provide one API to start a child process and give it a large number of parameters for all possible situations". And you can call plenty of APIs between fork and exec in the child safely, just like from signal handlers.

I haven't dealt with dependency hell ever since shared libraries got sonames.

The rest of the comment doesn't list anything of substance. If you want rebuttals for "the file system layout is a bad design" or "the C compilation is a bad design" or anything else, provide some reasons why those are bad designs; some of those reasons may be valid criticism, and some may not be, but one can't just make vacuous statements like that and expect a reasonable discussion to follow.

2 comments

Unix signals have been called garbage by some and "unfixable" by others [1]. The article [1] explains the evolution of signal handling, from sigvec(), sigaction(), to signalfd() -- a rocky history fraught with problems, an article in the series "Unfixable designs".

> So while signal handlers are perfectly workable for some of the early use cases (e.g. SIGSEGV) it seems that they were pushed beyond their competence very early, thus producing a broken design for which there have been repeated attempts at repair. While it may now be possible to write code that handles signal delivery reliably, it is still very easy to get it wrong. The replacement that we find in signalfd() promises to make event handling significantly easier and so more reliable.

Another critic makes the case that "signalfd is [also] useless" [2]:

> "UNIX[] signals are probably one of the worst parts of the UNIX API, and that’s a relatively high bar."

Signals came up recently on HN when someone remarked that not even memset() is signal-safe! [3]

All in all, working with signals correctly requires mastering a tremendous degree of complexity. Other platforms have provided simpler APIs, such as Structure Event Handling (SEH) [4].

[1] https://lwn.net/Articles/414618/

[2] article link from https://news.ycombinator.com/item?id=9564975

[3] https://news.ycombinator.com/item?id=13313563

[4] An HN comment describing how it's simpler: https://news.ycombinator.com/item?id=13323870

P.S. Please note that the views quoted above are not necessarily my views.

Like I said, there are some valid arguments on both sides. But a blanket "signals are garbage" is not useful or correct.
I'm not going to defend everything xenadu02 said, but I think there were some points that resonated with me even though I agree they could be expressed more constructively.

> Why does ls do sorting? Why does grep do -R recursive searching? How is that "Do one thing and do it well"?

I think these are valid examples of how Unix itself fails to follow the "Unix philosophy" of "Do One Thing and Do It Well".

> The fork/exec model is not "garbage". It is actually a fairly nice alternative to the "provide one API to start a child process and give it a large number of parameters for all possible situations". And you can call plenty of APIs between fork and exec in the child safely, just like from signal handlers.

fork-exec complicates the implementation of threads (see atfork handlers). Rather than "a large number of parameters for all possible situations", another alternative would be to have (1) a call which given executable name and arguments returns an opaque handle (or file descriptor) representing the process to be started (2) a bunch of further calls to set attributes on that handle – new features could add new APIs acting on the handle, or an extensible API like ioctl could be used – if there is a handle to represent the current process, then you only need one API call to set it for the current process or a child to be started (3) finally, a start call which turns the process-to-be-started handle into a running process handle.

> Unix file permissions are shit

The user-group-other model is arguably too limiting. ACLs are a better idea, but then should you use POSIX ACLs or NFSv4 ACLs?

The distinction between primary group ID and supplementary group IDs is silly.

Why must every file have both a UID and a GID? For files owned by a single user, you end up creating a dummy group like "staff" or so on just to obey the rule that every file must have a GID. For shared files, e.g. project files, files generally end up owned by their creator, even though in a business sense they really belong to the project not to whoever created them. It would make more sense if the owner could be either a user or a group, and then also have zero or more non-owning groups associated with it.

In most cases permissions should only exist on the directory, and then automatically apply to any files in the directory. (In most cases every file in the same directory should have the same permission; Unix bases its design on the exception rather than the rule.) Of course, hard links make this impossible, but I think hard links were a mistake.

The executable permission bits actually do double duty as a file type indicator. That's rather ugly. If Unix had explicit file types (rather than just a naming convention of file extensions), then certain file types could be declared to be executable. Executable permission would then mean "you are allowed to execute this if it is an executable" instead of "this is an executable". Stuff like the +x vs +X distinction in chmod would never have been necessary.

> Let's not even get into everything is a file

Unix would have been much better if everything were a file descriptor, rather than having stuff like pid_t. Linux at least is evolving in this direction. Plan9 does it better. Even the WindowsNT philosophy of "everything is a handle" is better than the traditional Unix approach.

Regarding ACLs, I'd say that there's little choice here: it has to be NFSv4.

The rationale for this is that POSIX ACLs are firstly too simple to model what we need. And they are also non-standard (POSIX .1e ACLs are a DRAFT specification which was never ratified).

NFSv4 ACLs are vastly more featureful, already implemented to support NFSv4 in kernel, though not available in userspace AFAICT. On FreeBSD and other platforms using ZFS, they are also used by ZFS and are directly exposed to userspace, making rich ACLs usable as the default permissions model system-wide when running on ZFS. Linux, unfortunately, doesn't yet do any of this, even when using ZFS.

The irony is that whilst the standards document was never ratified most people implemented it anyway. So actually, they are a standard. (-:
Programs have features because they are useful. Some features may not fit your view of what the philosophy should dictate, and that's OK. Having a recursive ls doesn't bother me for example.

Fork-and-exec isn't complicated by threads. Only fork-and-keep-executing is.

UNIX doesn't have a naming convention using file extensions.

Some of your points are valid opinions that are shared by others, but I don't know how much they have to do with the UNIX philosophy.

Some APIs can be improved, sure. And some are being improved. It takes time because of unix's success and most systems' desire to remain backward compatible (especially in source form).

> Fork-and-exec isn't complicated by threads. Only fork-and-keep-executing is.

Another issue is that fork-and-exec doesn't work well with languages with complicated runtimes, e.g. multithreaded garbage collection. It forces you to use a lower level language (such as C) to write all the code between fork and exec. An API based on process handles with a separate "start" call to convert a not-yet-started handle into a running process wouldn't have that deficiency.

Another issue is that it is very hard to implement robust error handling without race conditions in the fork-exec model. What if the child process encounters an error between the fork and the exec? How does it notify the parent process of exactly what error it got (e.g. "setsid failed"?) You need some sort of IPC mechanism between the child and the parent. And such an IPC mechanism is prone to race conditions. By contrast, the process handle-based API I suggested doesn't have this problem since it doesn't introduce more concurrency into the system than is absolutely necessary.

> UNIX doesn't have a naming convention using file extensions.

Yes it does. The average Unix system is full of file extensions like .c, .h, .so, .html, etc. Even in Unix V1 file extensions were used as a convention - http://minnie.tuhs.org/cgi-bin/utree.pl?file=V1

> Some of your points are valid opinions that are shared by others, but I don't know how much they have to do with the UNIX philosophy.

Is there a clear definition of what the "UNIX philosophy" is? Is any criticism of Unix systems as actually implemented a valid criticism of the "Unix philosophy"? Or do you want to define the "Unix philosophy" so vaguely as to put it beyond any possibility of criticism?

> Another issue is that fork-and-exec doesn't work well with languages with complicated runtimes

How are you doing fork-and-exec in a language with a large runtime? You are either using the language-provided APIs to do it, in which case they should document the restrictions on what you can call (and you should follow those), or you are dipping down into the C or system call layer to do your own fork-and-exec, in which case yeah, you still need to keep to the safe list of routines you can call between fork and exec, and you may have extra limitations since you are mucking around underneath your language's runtime (like you may have to unignore signals on your own, close file descriptors, etc). No surprises there.

> Another issue is that it is very hard to implement robust error handling without race conditions in the fork-exec model.

I don't think it is. You just print an error to stderr (write() is safe to call), and you return a bad error code (fork has built-in IPC for error codes via wait() in the parent).

> Is there a clear definition of what the "UNIX philosophy" is?

I don't know, ask the person who first invoked that phrase in this thread. They claimed it meant "do one thing and do it well" to them, and then they complained about things that didn't seem related to me (like file extensions, what does that have to do with programs "doing one thing"?).

> > Another issue is that fork-and-exec doesn't work well with languages with complicated runtimes

> How are you doing fork-and-exec in a language with a large runtime? You are either using the language-provided APIs to do it, in which case they should document the restrictions on what you can call (and you should follow those), or you are dipping down into the C or system call layer to do your own fork-and-exec, in which case yeah, you still need to keep to the safe list of routines you can call between fork and exec, and you may have extra limitations since you are mucking around underneath your language's runtime (like you may have to unignore signals on your own, close file descriptors, etc). No surprises there.

Let's say I am using JNA – https://github.com/java-native-access/jna – under Java. It is safe to call posix_spawn from Java code using JNA. It is safe to call the Windows API equivalent (CreateProcess). It would be safe to call the handle/descriptor-based API I proposed. It is not safe to call fork. This is an undeniable deficiency of the fork-exec approach which competing approaches don't have. Furthermore, whatever compensating advantages fork-exec may have, the handle/descriptor-based API I proposed has the same advantages without this disadvantage.

> > Another issue is that it is very hard to implement robust error handling without race conditions in the fork-exec model.

> I don't think it is. You just print an error to stderr (write() is safe to call), and you return a bad error code (fork has built-in IPC for error codes via wait() in the parent).

But that isn't robust. How can the parent process reliably distinguish output sent by the child process prior to the exec from output sent by the child process post the exec? Likewise, how can the parent process reliably distinguish an error return value from the child process prior to the exec from an error return value from the exec'd program? It can't.

For truly robust error handling, you'd actually need to do something like this: (1) have a pipe between parent and child process with FD_CLOEXEC set on the child side; (2) the child sends the parent a message "I'm about to exec" before calling exec; (3) the child sends the parent a message saying "exec failed with errno=.." if the exec call fails; (4) if the exec call succeeds, the child process will close its end of the pipe without sending any message post "I'm about to exec". This is my point, actually robustly handling errors in the fork-exec model is quite complex. In a handle/descriptor based API it would be much simpler.

(And the above approach using a pipe isn't perfectly robust – what if the child process crashes for some reason between sending the "I'm about to exec" message and actually calling exec()? It is very difficult for the parent process to reliably distinguish that scenario from some failure in the program being exec()'d.)

> Let's say I am using JNA. [...] It is not safe to call fork.

Are you calling fork() from Java, from C, or using the system call number?

Because I'd agree calling it from Java might be unsafe (depends on how Java and JNA interact), but I believe calling it from C or the system call is perfectly fine. And this is in line with what I've written previously.

> But that isn't robust.

It's not supposed to be robust in the way you are describing.

The fork-exec model is low level. It is supposed to be low level. Doing high level things with it is supposed to take some work by the application. That's not a deficiency.

If you build too many things into the low level code, you run into trouble because now you've got 10x as many ways to fail (building your pipes, writing your error messages, marshalling error state, cleaning up, you name it).

Also, some programs will want to do some of those higher level things differently, so instead of baking them into the API and having tons of parameters and paying for some of that overhead (like creating a pipe and writing error messages to the parent for every single fork and exec) you only do that when you want it.

The Windows NT model built on operating systems design thought that happened in the 1980s, that took far too many years to trickle into the other operating systems whose designs such thought was looking at.

However, FreeBSD has had process descriptors since roughly 2010. They have the slightly odd semantics of terminating processes when all descriptors to them are closed. But they can be used as descriptors with kqueue() and the like.