Two Objects Not Namespaced by the Linux Kernel (2017) | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Two Objects Not Namespaced by the Linux Kernel (2017) (blog.jessfraz.com)
	169 points by setra 2792 days ago

9 comments

haberman 2791 days ago

> The current set of namespaces in the kernel are: mount, pid, uts, ipc, net, user, and cgroup. [...] [Time is] not namespaced. [...] The kernel keyring is another item not namespaced.

I've always argued that "everything is a file" is an exaggeration. These moments make the extent of that exaggeration clear.

If everything truly was a file, the only thing you would need to namespace is the filesystem. But in reality there are a lot of other kernel objects that are not files at all.

zapita 2791 days ago

You are 100% correct. “Everything is a file” was more of an early design insight, which was gradually abandoned as new features were added.

There is a movement of “Unix purists” who lament this deviation from founding principles, and advocate for a return to them. The most notable example is Plan 9.

In Plan 9, everything actually is a file. And exactly as you said, all resources are namespaced via the filesystem. It’s quite elegant and practical.

Sadly Plan 9 has remained a fringe OS, and although it influenced mainstream operating system design in many ways (including the concept of /proc), I wish that influence had been stronger.

AceJohnny2 2791 days ago

I also liked QNX, when I worked with it.

You really did access devices through the /dev/ system, and device-drivers were userspace programs that created files in /dev/.

If your driver crashed, you could kill the userspace driver (which deleted the file under /dev) and restart it (assuming hardware blah blah blah).

Someone 2791 days ago

”device-drivers were userspace programs that created files in /dev/. If your driver crashed, you could kill the userspace driver (which deleted the file under /dev)”

I think that shows not everything is a file. If everything were, you would start the driver by creating the file (say as a hard link from a file in /dev to the driver executable) and kill the driver by rm-ing the file.

(Chances are that, if you follow this through, this idea won’t support everything we want to do with drivers, but if so, that’s an indication that “everything is a file” doesn’t work)

zapita 2791 days ago

To give you a sense of how far Plan9 took the idea... To open a tcp connection, you create a special “control file” at `/net/tcp/ctl` or some similar path, then write newline-terminated text commands to the file descriptor. That descriptor now represents your socket. You can also browse its contents as a directory (in plan9 each node in the filesystem can be both a regular file and a parent directory).

AceJohnny2 2791 days ago

Great point.

pjmlp 2791 days ago

It might have been elegant, but doing high performance graphics rendering wasn't something Rio was able to do.

temac 2791 days ago

hm is NT a purer Unix than Unix then? After all, it has all its object in a filesystem-like tree...

tadfisher 2791 days ago

In Windows NT, everything is an object. This is derived from VMS, which is essentially NT's predecessor (a principal designer of both being David Cutler of Digital Research and later Microsoft).

jackfraser 2791 days ago

The problem I always had with this was that Windows has this whole layer of objects and what might even be elegance in places all hidden under the hood, and unless you're a C++ hacker you can't actually work with most of it. CMD and the GUI tools never exposed half of it to you; Powershell helps, but it's all still very hidden and hard to get to.

In comparison, Unix provides all the tools needed to take it apart and put it back together again. When you do need to interact with some syscall interface, there's almost always a complete CLI around it. It really makes it much easier to get into the nooks and crannies, inconsistencies aside.

yjftsjthsd-h 2791 days ago

Yeah, as a person who hates Windows and loves all things unix, the NT kernel and underlying system have long struck me as a well-designed, nice system... with a poor userland and a terrible UI on top. But the kernel is nice.

atombender 2791 days ago

He's talking about the kernel. The Windows NT kernel and the Windows APIs use handles to represent kernel/API objects, and every handle has things like security associated with it.

For example, CreateProcess() gives you back a HANDLE value representing the process, and you can close it with CloseHandle(). Everything is a HANDLE: Files, pipes, threads, etc. A notable exception is sockets, which for historical reasons use an API modeled on BSD sockets.

The object stuff you're talking about is presumably COM, which is different. COM is great, but has nothing to do with the kernel.

martyvis 2791 days ago

David Cutler developed VMS at Digital Equipment Corporation (DEC). Digital Research was a different company - it developed CP/M.

caf 2791 days ago

I've always argued that "everything is a file" is an exaggeration.

This is true, but also bear in mind that "everything is a file" didn't mean "everything is represented by a name in the mount tree", it really meant "(almost) everything is referred to by a file descriptor".

It's still true that the most painful things to deal with are the ones that aren't represented by a file descriptor.

yayana 2791 days ago

I've always thought of it as the preferred interface to Userland when there isn't an overriding factor.

Within a kernel it seems like no one cares how the sausage is made.

DonHopkins 2791 days ago

If time were a file, you could gzip it up to compress it, and store it away for later.

Time files like an arrow!

steffan 2791 days ago

Fruit files like an Apple?

tyingq 2791 days ago

I agree it's unfortunate, but it doesn't really seem in conflict with "everything is a file".

Making it a file is separate from making it sensible/usable for containers. Like the /proc filesystem. They are "files", but don't many don't work as expected without something like lxcfs. Like /proc/uptime, for example.

sytelus 2791 days ago

The abstraction is not really file but stream of bytes. It turns of any object with stream of bytes will need similar set of operations: open, create, read, write, close, seek etc. This is fairly generic and powerful abstraction.

cyphar 2791 days ago

The abstraction is a file descriptor. Not all things represented by file descriptors support read(2) or _llseek(2), but by representing them as file descriptors you can reuse other things like af_unix file descriptor passing.

fulafel 2791 days ago

Yeah, this was one of the headline changes in Plan 9 (the second OS made by the Fathers of Unix).

pjmlp 2791 days ago

And improved in Inferno, which fixed some of Plan 9 flaws, the third OS made by the Fathers of Unix that HNers keep forgetting about.

madhadron 2791 days ago

Everything is a file hasn't been true for Unix since almost the beginning. It's kind of like the Unix philosophy of small, independent tools...except for the database where you store all the important data.

lunchables 2791 days ago

I thought "everything is a file" referred to user land.

wmf 2792 days ago

Since this was written a time namespace was proposed: https://www.phoronix.com/scan.php?page=news_item&px=Linux-Ti...

DonHopkins 2791 days ago

This proposal implements clock offsets, but does it support continuous time scaling? One clock-handy use case would be to run your programs really slow or fast (or backwards!), for testing purposes.

Kaleida Lab's ScriptX (a multimedia programing language kinda like Dylan with classes) had built-in support for hierarchal clocks within the container (in the sense of "window" not "vm") hierarchy. The same way every window or node has a 2D or 3D transformation matrix, each clock has a time scale and offset relative to its parent, so anything that consumes time (like a QuickTime player, or a simulation) runs at the scaled and offset time that it inherits from through its parent containers. And you can move and scale each container around in time as necessary, to pause movies or simulations.) You could even set the scale to a negative number, and it played QuickTime movies backwards! (That was pretty cool in 1995 -- try playing a movie backwards in the web browser today!)

http://www.art.net/~hopkins/Don/lang/scriptx/tech-qa.html

Q: How does the ScriptX core class library compare to class libraries available with other programming languages (e.g. MFC, OWL, MacApp, or TCL)?

A: The ScriptX core class library has many similarities to other object oriented frameworks in that it provides many basic services common to all applications built on them. All frameworks provide classes for creating windows, handling keyboard and mouse events, reading and writing files, etc.

Where ScriptX is unique is in its focus. The ScriptX core classes are oriented towards interactive, media rich applications. For example, clocks and timing are fundamental in the ScriptX class library; most other frameworks have no concept of timing built in.

ScriptX also tightly integrates media data (bitmaps, video, audio) with the class library, and hides the details of storing, retrieving, and presenting media to the user.

Q: What are the major sections of the core class library?

Clocks, players, and animation.

Time is a fundamental element of the core classes. Starting with basic clocks, subclasses extend the capabilities for animation, video, and audio playback.

Clocks can be tied to underlying hardware clocks or slaved in a hierarchical fashion to other clock objects. Varying the rate of a master clock, all sub-clocks will stay synchronized to the master clock, permitting the programmer to precisely control time in a title. Clock hierarchies also free the programmer from dealing with differences in performance between slower and faster CPUs.

Player classes build upon clocks. These classes allow you to create and play sequences of actions that take place over time. These sequences can be used to create animations as well as control other presentation elements such as video or sound.

A special type of clock object, the action list player, can be used to play actions in sequence at specified times. Various action objects are added to an action list specifying the time at which the action is to occur. Action objects are used to move graphic elements on the screen, execute ScriptX code, or modify the action list.

Other player classes provide simple ways to play digital video, audio, and MIDI. As with all players, the clocks underlying these players can be sped up, slowed down, or run backwards.

Q: Can video be synchronized with other events?

A: Yes, internal video players are based on ScriptX clocks and can be slaved together to provide synchronization with animations and other events. For example, buttons can appear in a window at precise times based on video playback.

cyphar 2791 days ago

> This proposal implements clock offsets, but does it support continuous time scaling?

No. The main reason why is because it's very difficult to do with the current time-keeping machinery within the kernel. Some people also want the ability to freeze the current time, which is also similarly difficult -- and in some cases harder because then what should CLOCK_MONATONIC give you? There's also the fact that there's currently no interface to set the "clock speed" do any of these things.

Making time go backwards I think would simply be impossible, due to how many things in the kernel that interact with time probably make the (reasonable) assumption that time goes forwards. Also CLOCK_MONATONIC would do the exact opposite in such circumstances.

cperciva 2791 days ago

You mean "CLOCK_MONOTONIC", not "CLOCK_MONATONIC". (I'm guessing this is a misspelling, not a typo, since it appeared twice.)

And the simple answer is that if time stops then CLOCK_MONOTONIC always returns the same time. This is perfectly fine given correct software; CLOCK_MONOTONIC is guaranteed to not go backwards but it it not guaranteed to always go forward. One could imagine for example a system with a very inaccurate clock where CLOCK_MONOTONIC simply counts days, for example.

OskarS 2791 days ago

This proposal implements clock offsets, but does it support continuous time scaling? One clock-handy use case would be to run your programs really slow or fast (or backwards!), for testing purposes.

What use-case would you have for this? Making sure your program runs properly in the near presence of a black hole?

guipsp 2791 days ago

Your snark aside, clocks are not perfect, and a malfunctioning clock might speed up or slow down.

Also, speeding up the clock is a technique already used in testing enviroments [1].

[1]: https://github.com/majek/fluxcapacitor

OskarS 2790 days ago

Sorry, I honestly didn't mean to be snarky. It was a genuine question. I couldn't really see the real-case justification for testing a clock that slowed down, sped up, or went backwards. But you're right, malfunctioning clocks would be an example.

derefr 2792 days ago

I wonder whether namespacing time would also result in those namespaces being able to have separate "clocks" (time backends? time schedulers?) that progress at different rates, or for different reasons.

Being able to put a process into a time namespace with a deterministic "clock" would obviate a large benefit of http://www.zerovm.org/.

Also, having "clock slew" be a matter of perspective—with processes that can handle leap seconds seeing them happen instantaneously; and processes that can't handle leap-seconds, seeing slewed time—would be nice. Then you could have different system facilities that care about monotonic time, vs. synced to calendar time, vs. one second per second time, all having that kind of time available to them as "the time", rather than through different APIs.

rwmj 2791 days ago

Accelerated time might also be a way to test programs. It's similar to techniques used to test planes (by repeatedly pressurising and depressurising them). It might, for example, reveal race conditions faster in programs that ordinarily do a lot of sleeping. I wrote a bit more about this (unproven) idea here: https://rwmj.wordpress.com/2010/10/14/half-baked-ideas-accel...

kbenson 2792 days ago

> Also, having "clock slew" be a matter of perspective—with processes that can handle leap seconds seeing them happen instantaneously; and processes that can't handle leap-seconds, seeing slewed time—would be nice.

I imagine there might be some really interesting (for meanings of interesting that include shoot me now) and hard to track down bugs as you deal with inconsistent clocks not just across systems within a network, but processes within a single system.

derefr 2791 days ago

> I imagine there might be some really interesting (for meanings of interesting that include shoot me now) and hard to track down bugs as you deal with inconsistent clocks not just across systems within a network, but processes within a single system.

I feel like the "safe assumption" that the other end of a given IPC channel (or even inter-thread communication channel) is on the same machine, is responsible for the vast majority of failures we see in e.g. Jepsen testing of databases.

After all, in sufficiently-large computers (i.e. HPC clusters that pretend to be one "computer"), you've got NUMA zones that are light-microseconds away from one another, where even threads of the same process can literally end up needing vector clocks to linearize events between themselves.

It probably wouldn't be too bad a thing if things like the Linux base-system used only internal IPC mechanisms that exposed this unreliability (like e.g. Erlang does with "unreliable async message passing" as its IPC primitive), forcing each component to deal with the fact that its peers may or may not be netsplit away from it.

Even if that scenario will only come up if you're writing code to get your GPS position from a Dyson sphere of 10-mile-deep Matryoska brains.

kbenson 2791 days ago

I bet that assumption is responsible for a large number of problems. I just also think it's correct enough most the time and relied on enough that if it all of a sudden often wasn't true, we'd see our carefully crafted applications for what they really are, a pile of assumptions that sometimes have little relation to reality.

chatmasta 2791 days ago

IIRC Docker for Mac had a bug like this for a long time where the clocks of containers would become wildly out of date.

TheDong 2791 days ago

More accurately, the clocks of the linux virtual machine running docker containers would differ from the OSX clock.

Those aren't really containers skewing from other processes on the same system as the parent describes, but of clocks skewing on two different systems (which is a totally normal thing we deal with regularly).

cyphar 2791 days ago

There is a time namespace proposal[1], but currently the answer to this question is no. The reason is that timekeeping is incredibly complicated within the kernel (for instance -- when userspace gets the current time, it's read from a vDSO page that the kernel injects into every process and thus is updated by the kernel asynchronously). Adding different clock speeds is already non-trivial, let alone switching out different time backends.

The current time namespace proposal just allows you to set the current time separately from the host, which is actually quite a difficult thing to do already (it takes 20 patches)...

[1]: https://lore.kernel.org/lkml/alpine.DEB.2.21.1810022310360.1...

zapita 2791 days ago

Is zerovm still active? I loved the concept, but the startup behind it is gone, and it’s built on Nacl which is being deprecated by wasm... I would live to see it portes to wasm and expanded behind the original “python compute embedded in openstack storage” use case, which was underwhelming. There are so many more exciting applications to server-side wasm. I hope someone actually builds this.

At one point I got my hopes up that Docker would build this as the logical next step after Linux containers... But they seem to be focused on monetizing the containers/kubernetes movement, which makes sense as a business decision but still is disappointing.

derefr 2791 days ago

Considering that PNaCl was made for running untrusted, user-supplied native code in a sandbox-environment resembling that of native Linux binaries; and was used for this in e.g. Google App Engine to build the various first-generation container runtimes...

...and considering that GVisor (https://github.com/google/gvisor) is now used by Google for that same use-case...

...then perhaps GVisor (or a thin "make everything deterministic" layer on top of it) could be looked at as something like a "spiritual successor" to ZeroVM?

vlovich123 2791 days ago

Couldn't that just be done as part of libfaketime? Now granted it's harder to do an entire OS with that but you could run it within a VM that itself is run by libfaketime, no?

theamk 2791 days ago

I personally miss core pattern namespacing. I would love to give some of my containers a custom coredump handler, but this is impossible.

And in general, a sysctls settings namespace would be really useful. Sure, sometimes it makes no sense to namespace a setting, but net.ipv4.tcp_congestion_control for example? I'd love to be able to change it without modifying the code.

vxNsr 2791 days ago

meta: This is from 2017,

Super interesting though, the keyring thing especially seems to have broader implications...

tyingq 2792 days ago

Syslog seems to be on the proposal list as well.

lalaithion 2792 days ago

Why is this the case? No one has bothered to do it? It would break backwards compatibility? Linus thinks it's a bad idea?

jchw 2792 days ago

Shouldn't break backwards compatibility. More than anything, my guess is that it's just a result of most of Linux's modern day design having been implemented before the era of containers. Afaik, namespaces+cgroups were never meant to support complete isolation.

simcop2387 2791 days ago

The time namespace is being worked on, it's a very difficult problem because of how pervasive time is in the kernel.

Here's a recent in depth LWN article about the topic. https://lwn.net/Articles/766089/

They keychain stuff I haven't heard about any work being done but I don't know any reason it shouldn't be doable.

emmelaich 2792 days ago

Probably merely because it's hard to do and no one has sufficient motivation.

I can think of one good use case -- y2k style problems.

Also sometimes apps are tied to external events like legislation. It would be good to set the time forward for testing.

You can sort of do this with LD_PRELOAD but it can get hairy.

Also see @wmf's comment above.

etaoins 2792 days ago

Another use case is dealing with tokens that assume globally synchronised clocks such as JWTs and Kerberos/Active Directory. Ideally all clocks would be perfectly synchronised but things happen.

For example, you might have one container that’s exchanging JWTs with a micro service that should be using AWS’s NTP servers and another that’s joined an Active Directory domain that should be using the AD NTP server. Right now you either need to run them on separate machines or expose yourself to interesting problems if clock skew happens.

briffle 2791 days ago

If there is one thing Y2K taught us, its to ignore any worry about the 2038 problem until 2036, then make a HUGE deal out of it.

https://en.wikipedia.org/wiki/Year_2038_problem

cyphar 2791 days ago

Linux and glibc have been working on 2038 problems for at least the past decade.

pjmlp 2791 days ago

There are plenty of other POSIX platforms out there.

Sharlin 2791 days ago

I’m not sure that people who think ”containers are just like VMs” should have any business working with containers.

timeattack 2792 days ago

You can't change time in container, but it's possible to change timezone files.

With generating fake timezones it is possible to change time in container.

cyphar 2791 days ago

This doesn't change what gettimeofday(2) gives you (and actually you can't even use ptrace easily to fake the time of day because gettimeofday(2) isn't a real syscall -- it's actually implemented as a read from the vDSO page the kernel maps into every process).