Hacker News new | ask | show | jobs
by glangdale 2883 days ago
Seriously, yes. If there was ever a time to rethink OS design, surely this is it.

That being said, operating systems like Linux tend to capture most of the value from these kind of advances - often by dint of being able to simply 'get out of the way' if a sufficiently important user space process wants access to the device.

But one would suspect that things have changed sufficiently from the 1970s to warrant a ground-up rethink. Core counts, distributed systems (the Plan 9 folks already too a swing at this in the 90s), nearly ubiquitous graphics/GPGPU accelerators, persistent memory, nearly ubiquitous access to 64-bit address spaces (at least for desktop and most phones) - you'd think something would change about design. I don't work in the area so I don't know what that is...

1 comments

> Seriously, yes. If there was ever a time to rethink OS design, surely this is it.

Why?

Traditional servers are persistent: they never turn off. 500+ days of uptime is typical. And today, with VMs which at worst... hibernate... it seems like "never turning off" might be the norm.

On the contrary, as a security professional I’d be thrilled if servers had a lifespan of hours instead of weeks or months. Reimaging VMs/containers/machines from scratch frequently gives so many advantages.

When OS, system, or library updates happen, you can easily launch replacement servers on the updated stack, put them in the rotation, and decommission the old ones. This is so much simpler than trying to run OS upgrades in-place across an entire fleet. The longer a machine has been running between reboots, the lower my belief in its odds of upgrading and restarting cleanly.

Further, this regularly tests your load balancing setup and pretty much fundamentally gives you capacity to scale up and down as load permits. Problems will be discovered early on, instead of during crunch time when you have to scale or when a few of your machines go offline during peak hours.

Security-wise, you don’t just get the benefit of fast, regular updates. But you also get assurances that users haven’t left stale data like unencrypted database exports, PII dumps, etc. lying around. Go on a long-lived machine some day and check out users’ home directories. That shit is a gold mine if someone who wants to do harm gets on your systems.

Not to mention regular reimaging makes it harder for an attacker to establish a permanent foothold in your infra.

None of this has anything to do with fast persistent storage, but I sincerely hope the era of 500-day uptimes is waning.

You forgot, frequent rebuilds kill off any intrusion as the world reflashes -- unless they get into IoT or microcontroller packages.
I did mention that it makes it harder for an attacker to keep a foothold in your infrastructure, but I think I wasn't as clear as I wanted to be.

But yeah, it's bad that an attacker has been able to get to a critical system, but it's a phenomenal defense if any of their beacons or remote access tools last at most a few hours or days before being wiped. This makes an attacker's life much harder.

> None of this has anything to do with fast persistent storage, but I sincerely hope the era of 500-day uptimes is waning.

On the contrary. Persistent memory means that infinite uptime is the future. Which, as you note, is difficult. Resetting the OS every now and then to a known state is a good practice, although disruptive to a lot of workflows.

If anything, I consider your post to be an argument AGAINST persistent memory.

Persistent memory might enable those sorts of uptimes, but it doesn't inherently mandate it.
But traditional operating systems still assume RAM contents is volatile (because currently it is), most filesystems assume disks are glacially slow etc.

A traditional spinning rust HDD has an effective latency of ~10 ms. The NVME version of the 3DXP has an effective latency of ~50 us, or two orders of magnitude better. Not sure how low the DIMM version will go, but maybe another order of magnitude?

If so, we're talking three orders of magnitude difference. That would radically affect the assumptions going into storage algorithms. Suddenly you can no longer spend millions of instructions trying to avoid I/O. Batching of I/O is also not needed to the same degree. Complex syncing of memory and disk is not needed. Etc etc.

> But traditional operating systems still assume RAM contents is volatile (because currently it is)

RAM is only volatile on startup. Certainly not when a VM hibernates and comes back.

> RAM is only volatile on startup.

No it isn't. Anything that needs to survive a power cycle needs to go to non-volatile storage. And this is assumed to be very, very slow.

I don't think "computers stay up a long time these days" is an argument against doing OS research on order-of-magnitude-faster, byte-addressable persistent storage.

We seem to be doing pretty well with a bunch of abstractions from the 1970s, as well as with the idea of just building giant trapdoors into our hardware whenever these abstractions fail (e.g. most databases, DPDK in the network space, etc). It's not a crisis. It just seems like a pretty good time to do some basic OS research (aside from all the usual headwinds for that, e.g. massive complexity of underlying hardware, difficulty finding meaningful workloads for a "toy" OS, etc).

Your ideas around uptime are still in the 70's. Systemd updates require reboots. Then there's Spectre and Meltdown BIOS updates, gotta reboot for those. Oh and the SSD and NIC firmware as well.

To think we formerly only had one devil in glibc. Now everything is constantly being updated and it's fine. We've moved on from the uptime as phallic measuring stick mantra. Patch, reboot, and stay secure.