> The current set of namespaces in the kernel are: mount, pid, uts, ipc, net, user, and cgroup. [...] [Time is] not namespaced. [...] The kernel keyring is another item not namespaced.
I've always argued that "everything is a file" is an exaggeration. These moments make the extent of that exaggeration clear.
If everything truly was a file, the only thing you would need to namespace is the filesystem. But in reality there are a lot of other kernel objects that are not files at all.
You are 100% correct. “Everything is a file” was more of an early design insight, which was gradually abandoned as new features were added.
There is a movement of “Unix purists” who lament this deviation from founding principles, and advocate for a return to them. The most notable example is Plan 9.
In Plan 9, everything actually is a file. And exactly as you said, all resources are namespaced via the filesystem. It’s quite elegant and practical.
Sadly Plan 9 has remained a fringe OS, and although it influenced mainstream operating system design in many ways (including the concept of /proc), I wish that influence had been stronger.
”device-drivers were userspace programs that created files in /dev/. If your driver crashed, you could kill the userspace driver (which deleted the file under /dev)”
I think that shows not everything is a file. If everything were, you would start the driver by creating the file (say as a hard link from a file in /dev to the driver executable) and kill the driver by rm-ing the file.
(Chances are that, if you follow this through, this idea won’t support everything we want to do with drivers, but if so, that’s an indication that “everything is a file” doesn’t work)
To give you a sense of how far Plan9 took the idea... To open a tcp connection, you create a special “control file” at `/net/tcp/ctl` or some similar path, then write newline-terminated text commands to the file descriptor. That descriptor now represents your socket. You can also browse its contents as a directory (in plan9 each node in the filesystem can be both a regular file and a parent directory).
In Windows NT, everything is an object. This is derived from VMS, which is essentially NT's predecessor (a principal designer of both being David Cutler of Digital Research and later Microsoft).
The problem I always had with this was that Windows has this whole layer of objects and what might even be elegance in places all hidden under the hood, and unless you're a C++ hacker you can't actually work with most of it. CMD and the GUI tools never exposed half of it to you; Powershell helps, but it's all still very hidden and hard to get to.
In comparison, Unix provides all the tools needed to take it apart and put it back together again. When you do need to interact with some syscall interface, there's almost always a complete CLI around it. It really makes it much easier to get into the nooks and crannies, inconsistencies aside.
Yeah, as a person who hates Windows and loves all things unix, the NT kernel and underlying system have long struck me as a well-designed, nice system... with a poor userland and a terrible UI on top. But the kernel is nice.
He's talking about the kernel. The Windows NT kernel and the Windows APIs use handles to represent kernel/API objects, and every handle has things like security associated with it.
For example, CreateProcess() gives you back a HANDLE value representing the process, and you can close it with CloseHandle(). Everything is a HANDLE: Files, pipes, threads, etc. A notable exception is sockets, which for historical reasons use an API modeled on BSD sockets.
The object stuff you're talking about is presumably COM, which is different. COM is great, but has nothing to do with the kernel.
I've always argued that "everything is a file" is an exaggeration.
This is true, but also bear in mind that "everything is a file" didn't mean "everything is represented by a name in the mount tree", it really meant "(almost) everything is referred to by a file descriptor".
It's still true that the most painful things to deal with are the ones that aren't represented by a file descriptor.
I agree it's unfortunate, but it doesn't really seem in conflict with "everything is a file".
Making it a file is separate from making it sensible/usable for containers. Like the /proc filesystem. They are "files", but don't many don't work as expected without something like lxcfs. Like /proc/uptime, for example.
The abstraction is not really file but stream of bytes. It turns of any object with stream of bytes will need similar set of operations: open, create, read, write, close, seek etc. This is fairly generic and powerful abstraction.
The abstraction is a file descriptor. Not all things represented by file descriptors support read(2) or _llseek(2), but by representing them as file descriptors you can reuse other things like af_unix file descriptor passing.
Everything is a file hasn't been true for Unix since almost the beginning. It's kind of like the Unix philosophy of small, independent tools...except for the database where you store all the important data.
This proposal implements clock offsets, but does it support continuous time scaling? One clock-handy use case would be to run your programs really slow or fast (or backwards!), for testing purposes.
Kaleida Lab's ScriptX (a multimedia programing language kinda like Dylan with classes) had built-in support for hierarchal clocks within the container (in the sense of "window" not "vm") hierarchy. The same way every window or node has a 2D or 3D transformation matrix, each clock has a time scale and offset relative to its parent, so anything that consumes time (like a QuickTime player, or a simulation) runs at the scaled and offset time that it inherits from through its parent containers. And you can move and scale each container around in time as necessary, to pause movies or simulations.) You could even set the scale to a negative number, and it played QuickTime movies backwards! (That was pretty cool in 1995 -- try playing a movie backwards in the web browser today!)
Q: How does the ScriptX core class library compare to class libraries available with other programming languages (e.g. MFC, OWL, MacApp, or TCL)?
A: The ScriptX core class library has many similarities to other object oriented frameworks in that it provides many basic services common to all applications built on them. All frameworks provide classes for creating windows, handling keyboard and mouse events, reading and writing files, etc.
Where ScriptX is unique is in its focus. The ScriptX core classes are oriented towards interactive, media rich applications. For example, clocks and timing are fundamental in the ScriptX class library; most other frameworks have no concept of timing built in.
ScriptX also tightly integrates media data (bitmaps, video, audio) with the class library, and hides the details of storing, retrieving, and presenting media to the user.
Q: What are the major sections of the core class library?
Clocks, players, and animation.
Time is a fundamental element of the core classes. Starting with basic clocks, subclasses extend the capabilities for animation, video, and audio playback.
Clocks can be tied to underlying hardware clocks or slaved in a hierarchical fashion to other clock objects. Varying the rate of a master clock, all sub-clocks will stay synchronized to the master clock, permitting the programmer to precisely control time in a title. Clock hierarchies also free the programmer from dealing with differences in performance between slower and faster CPUs.
Player classes build upon clocks. These classes allow you to create and play sequences of actions that take place over time. These sequences can be used to create animations as well as control other presentation elements such as video or sound.
A special type of clock object, the action list player, can be used to play actions in sequence at specified times. Various action objects are added to an action list specifying the time at which the action is to occur. Action objects are used to move graphic elements on the screen, execute ScriptX code, or modify the action list.
Other player classes provide simple ways to play digital video, audio, and MIDI. As with all players, the clocks underlying these players can be sped up, slowed down, or run backwards.
Q: Can video be synchronized with other events?
A: Yes, internal video players are based on ScriptX clocks and can be slaved together to provide synchronization with animations and other events. For example, buttons can appear in a window at precise times based on video playback.
> This proposal implements clock offsets, but does it support continuous time scaling?
No. The main reason why is because it's very difficult to do with the current time-keeping machinery within the kernel. Some people also want the ability to freeze the current time, which is also similarly difficult -- and in some cases harder because then what should CLOCK_MONATONIC give you? There's also the fact that there's currently no interface to set the "clock speed" do any of these things.
Making time go backwards I think would simply be impossible, due to how many things in the kernel that interact with time probably make the (reasonable) assumption that time goes forwards. Also CLOCK_MONATONIC would do the exact opposite in such circumstances.
You mean "CLOCK_MONOTONIC", not "CLOCK_MONATONIC". (I'm guessing this is a misspelling, not a typo, since it appeared twice.)
And the simple answer is that if time stops then CLOCK_MONOTONIC always returns the same time. This is perfectly fine given correct software; CLOCK_MONOTONIC is guaranteed to not go backwards but it it not guaranteed to always go forward. One could imagine for example a system with a very inaccurate clock where CLOCK_MONOTONIC simply counts days, for example.
This proposal implements clock offsets, but does it support continuous time scaling? One clock-handy use case would be to run your programs really slow or fast (or backwards!), for testing purposes.
What use-case would you have for this? Making sure your program runs properly in the near presence of a black hole?
Sorry, I honestly didn't mean to be snarky. It was a genuine question. I couldn't really see the real-case justification for testing a clock that slowed down, sped up, or went backwards. But you're right, malfunctioning clocks would be an example.
I wonder whether namespacing time would also result in those namespaces being able to have separate "clocks" (time backends? time schedulers?) that progress at different rates, or for different reasons.
Being able to put a process into a time namespace with a deterministic "clock" would obviate a large benefit of http://www.zerovm.org/.
Also, having "clock slew" be a matter of perspective—with processes that can handle leap seconds seeing them happen instantaneously; and processes that can't handle leap-seconds, seeing slewed time—would be nice. Then you could have different system facilities that care about monotonic time, vs. synced to calendar time, vs. one second per second time, all having that kind of time available to them as "the time", rather than through different APIs.
Accelerated time might also be a way to test programs. It's similar to techniques used to test planes (by repeatedly pressurising and depressurising them). It might, for example, reveal race conditions faster in programs that ordinarily do a lot of sleeping. I wrote a bit more about this (unproven) idea here: https://rwmj.wordpress.com/2010/10/14/half-baked-ideas-accel...
> Also, having "clock slew" be a matter of perspective—with processes that can handle leap seconds seeing them happen instantaneously; and processes that can't handle leap-seconds, seeing slewed time—would be nice.
I imagine there might be some really interesting (for meanings of interesting that include shoot me now) and hard to track down bugs as you deal with inconsistent clocks not just across systems within a network, but processes within a single system.
> I imagine there might be some really interesting (for meanings of interesting that include shoot me now) and hard to track down bugs as you deal with inconsistent clocks not just across systems within a network, but processes within a single system.
I feel like the "safe assumption" that the other end of a given IPC channel (or even inter-thread communication channel) is on the same machine, is responsible for the vast majority of failures we see in e.g. Jepsen testing of databases.
After all, in sufficiently-large computers (i.e. HPC clusters that pretend to be one "computer"), you've got NUMA zones that are light-microseconds away from one another, where even threads of the same process can literally end up needing vector clocks to linearize events between themselves.
It probably wouldn't be too bad a thing if things like the Linux base-system used only internal IPC mechanisms that exposed this unreliability (like e.g. Erlang does with "unreliable async message passing" as its IPC primitive), forcing each component to deal with the fact that its peers may or may not be netsplit away from it.
Even if that scenario will only come up if you're writing code to get your GPS position from a Dyson sphere of 10-mile-deep Matryoska brains.
I bet that assumption is responsible for a large number of problems. I just also think it's correct enough most the time and relied on enough that if it all of a sudden often wasn't true, we'd see our carefully crafted applications for what they really are, a pile of assumptions that sometimes have little relation to reality.
More accurately, the clocks of the linux virtual machine running docker containers would differ from the OSX clock.
Those aren't really containers skewing from other processes on the same system as the parent describes, but of clocks skewing on two different systems (which is a totally normal thing we deal with regularly).
There is a time namespace proposal[1], but currently the answer to this question is no. The reason is that timekeeping is incredibly complicated within the kernel (for instance -- when userspace gets the current time, it's read from a vDSO page that the kernel injects into every process and thus is updated by the kernel asynchronously). Adding different clock speeds is already non-trivial, let alone switching out different time backends.
The current time namespace proposal just allows you to set the current time separately from the host, which is actually quite a difficult thing to do already (it takes 20 patches)...
Is zerovm still active? I loved the concept, but the startup behind it is gone, and it’s built on Nacl which is being deprecated by wasm... I would live to see it portes to wasm and expanded behind the original “python compute embedded in openstack storage” use case, which was underwhelming. There are so many more exciting applications to server-side wasm. I hope someone actually builds this.
At one point I got my hopes up that Docker would build this as the logical next step after Linux containers... But they seem to be focused on monetizing the containers/kubernetes movement, which makes sense as a business decision but still is disappointing.
Considering that PNaCl was made for running untrusted, user-supplied native code in a sandbox-environment resembling that of native Linux binaries; and was used for this in e.g. Google App Engine to build the various first-generation container runtimes...
...then perhaps GVisor (or a thin "make everything deterministic" layer on top of it) could be looked at as something like a "spiritual successor" to ZeroVM?
Couldn't that just be done as part of libfaketime? Now granted it's harder to do an entire OS with that but you could run it within a VM that itself is run by libfaketime, no?
I personally miss core pattern namespacing. I would love to give some of my containers a custom coredump handler, but this is impossible.
And in general, a sysctls settings namespace would be really useful. Sure, sometimes it makes no sense to namespace a setting, but net.ipv4.tcp_congestion_control for example? I'd love to be able to change it without modifying the code.
Shouldn't break backwards compatibility. More than anything, my guess is that it's just a result of most of Linux's modern day design having been implemented before the era of containers. Afaik, namespaces+cgroups were never meant to support complete isolation.
Another use case is dealing with tokens that assume globally synchronised clocks such as JWTs and Kerberos/Active Directory. Ideally all clocks would be perfectly synchronised but things happen.
For example, you might have one container that’s exchanging JWTs with a micro service that should be using AWS’s NTP servers and another that’s joined an Active Directory domain that should be using the AD NTP server. Right now you either need to run them on separate machines or expose yourself to interesting problems if clock skew happens.
This doesn't change what gettimeofday(2) gives you (and actually you can't even use ptrace easily to fake the time of day because gettimeofday(2) isn't a real syscall -- it's actually implemented as a read from the vDSO page the kernel maps into every process).
I've always argued that "everything is a file" is an exaggeration. These moments make the extent of that exaggeration clear.
If everything truly was a file, the only thing you would need to namespace is the filesystem. But in reality there are a lot of other kernel objects that are not files at all.