Hacker News new | ask | show | jobs
by Cloudef 822 days ago
> The everything is a file philosophy becomes much more clear then.

To be honest, everything is a file is kind of a lie in unix. /proc and /sys are pretty much plan9 inspiration.

2 comments

A more accurate term is that everything is a file descriptor.

The main difference is that plan9 uses read and write for everything, whereas Linux and BSD uses ioctls on file descriptors for everything.

Relevant talk by Benno Rice, "What UNIX cost us": https://youtu.be/9-IWMbJXoLM?si=OblWX3OMXWrSFinb
Everything is a descriptor. When I'm opening a TCP connection, there's no file, so calling it a file descriptor feels wrong.

And at that point, the whole "everything is" turns into nonsense, because yes, everything is a pointer to something, so what.

There are named files (which have a file path) and anonymous files (which do not). You can see these in /proc/$PID/fd/$FD if you're curious - when the link doesn't start with '/', it's anonymous. Even process memory is just an anonymous file on Linux, and arguably a cleaner one as it operates on proper fds, instead of plan9 where a string "class name" (not a path) is used to access the magical '#g' filesystem.

The difference to plan9 is not the files, but the way plan9 uses text protocols with read/write to ctl files. To open a TCP connection - if memory serves me right - you first have to write to a ctl file, which creates a new file for the connection. Then, you write the dial command to the ctl file of that connection, and after which you can open the connection file. On Linux, a syscall creates an anonymous file, and then everything after is operations on this anonymous file.

There's some ideological benefits, but plan9 creates a mess of implicit text protocols, ugly string handling, syscall storms and memory inefficiencies. Their design is pretty much solely a limitation caused by the idea that all filesystems should exist through the 9p protocol, which as a networked protocol cannot share data (fds, structs), only copy (payloads to/from read, write). With the idea that all functionality had to be replaceable and mountable from remote machines, the only possible API became read/write for everything.

I'd argue that fd-using syscalls and ioctls - basically per-file syscalls - is a superior approach to implement everything-as-a-file.

Whichever superior depends on your use case and needs. Plan9's approach is very powerful whenever you need anything distributed, and makes lots of boilerplate to achieve that basically unnecessary. Linux nowadays is flexible for both approaches (in theory, the ecosystem might not be there), and I'm glad user namespaces are a thing.
> There's some ideological benefits, but plan9 creates a mess of implicit text protocols, ugly string handling, syscall storms and memory inefficiencies.

On the other hand, linux ioctl and syscalls have infinite binary structs you need to know (and cannot let the compiler reorder fields in), which then doesn't make cross-platform development any easier.

Having to know structs is not really an issue - you also need to know text formats, JSON schemas, what not.

Re-ordering of structs is always forbidden with the binary format being strictly specified, so there's nothing to worry about there. Can't exactly shuffle bytes in a text format either, and plan9 control strings tend to have positional arguments.

The current structs do leave something to be desired though.

I was quite disappointed by the ABI differences between different architectures when I was doing network transparent uinput.

https://git.cloudef.pw/uinputd.git/tree/common/packet.h#n34 https://git.cloudef.pw/uinputd.git/tree/server/uinputd.c#n16

(Excuse to write code to use my PS Vita as gamepad :D)

Now, proper plan9-style namespaces, that I miss. :)

User namespaces are still a hell of a lot clunkier than "each process inherits its parents' namespace".

If you setup user namespace, the child processes will inherit that namespace. The difference is that plan9 is fully built on this idea and isn't multi-user, on linux you have to opt-in to this. It's very useful and underused (mostly used by containers). I wanted to ship my AWS lambdas this way, but sadly AWS lambdas don't allow user namespaces.

https://github.com/aws/aws-lambda-base-images/issues/143

I've only briefly looked at plan9 here and there over the years but you seem to have a pretty good handle on it and maybe could indulge me. Your comment raised a question for me that I hadn't considered before.

If I've got a Plan9 system mounted over, say, NFS, would this all mean that (ignoring permissions) I could effectively open a TCP connection from that remote machine by writing appropriate information to a file on the NFS share? It would be pretty inefficient I suspect, tunnelling TCP over NFS, but it seems like there could be an incredible amount of cool hacks that might Just Work as a side-effect of them going all-in on "everything is a file".

I am not exactly sure where you got the "class name" part from, I've typically refereed to those as kernel filesystems or sharp devices. For the record these kernel filesystems are not technically 9p, they present an interface much similar but reads and writes to them are not marshaled and unmarshaled from 9p. It is however possible to export their files over 9p if one desires, I can import a remote machines /net stack and use it to announce or dial out. Plan 9 gives us proto-VPNs just be virtue of its design.

There was perhaps a time where the differences between having everything in binary ioctls and bound to specifically one device was a necessary component in order to reach reasonable performance, but I don't believe that is the case anymore. Anecdotally these days everything on Plan 9 feels snappier. We have some benchmarks that show that 9front outperforms linux with naive pipe io and context switches. What Plan 9 misses in micro optimizations it makes up for by having a incredibly consistent and versatile base.

I want to reiterate the benefits of the network transparency by talking about how drawterm works. Drawterm can be thought of the plan 9 equivalent of windows RDP. How it works is that internally drawterm creates routines to expose a /dev/draw, /dev/mouse and /dev/keyboard through whichever native way there is on the target system (macos, windows, linux, etc). It then attaches to the remote system and overlays these files over a namespace. Programs like our window manager rio can then be run completely transparently, forwarding not compressed images, but individual draw RPC messages. There is no need for any special code on the plan 9 host side in order to accommodate drawterm, again it is something that just falls out of the core design of the system.

Even on linux people avoid syscalls because syscalls are slow and bad. So I don't really see the problem with plan9's approach either. Make the common scenario useful, optimize for special cases (sendfile, io_uring). In fact read/write lets you batch bigger amount of data than single ioctl can actually be more performant.
> Their design is pretty much solely a limitation caused by the idea that all filesystems should exist through the 9p protocol, which as a networked protocol cannot share data (fds, structs), only copy (payloads to/from read, write). With the idea that all functionality had to be replaceable and mountable from remote machines, the only possible API became read/write for everything.

It's not clear to me that 9p itself could not be extended to allow for shared memory. With low-level control over the operating system and rebuilding of existing binaries, distributed shared memory becomes a possibility. (I.e. the existing VM system ought to be enough to implement whatever cache coherence is needed for shared memory over the network.)

> magical '#g' filesystem.

Whats magical about the segmet(3)[1] device? The '#' devices are kernel file servers. There's no magic.

[1] http://man.9front.org/3/segment

Also, a lot of devices require very specific ioctl() commands to work with and don't provide everything as a file.

For example, you can't set the baudrate of a serial port by writing it to some /proc node.

nit: The ioctl syscall targets files just the same as the write syscall.

Everything being a file, and everything being read/write calls are different things. There's pros and cons.