Hacker News new | ask | show | jobs
by binarycrusader 4682 days ago

  If you think OSTree needs filesystem snapshot support, you
  aren't understanding how it works. Or perhaps we don't
  have a shared definition of "robust" - for me, assuming
  correct behaviour at the filesystem and block layer, I
  believe deployment changes to be atomic.
While OSTree itself may not need filesystem snapshot support, for "robust OS upgrades" (the original title of the ycombinator story) you do need some sort of snapshotting mechanism. Whether that's in the form of ZFS-style snapshots, or boot environments is up to the implementer.

Also, I'd argue that assuming correct operation at the filesystem and block layer is a pretty big assumption :-) Even if the software itself works correctly, time has proven that trusting the hardware is not a good thing.

With that said, yes, I think OSTree needs filesystem snapshot support if admins are to rely on it to provide "robust" OS upgrades.

  How do you determine reboot-needed=true? Is that
  something assigned by the package developer statically
  (i.e. kernel is reboot-needed=true)? Determined
  dynamically at update time (like in the package metadata
  for a specific revision?)
Currently, reboot-needed is determined by the package creator at the individual item level by tagging individual "actions" in a package manifest. It's a way for a package creator say "this component can't be safely updated on a live system, so force the creation and use of a boot environment clone when updating it."

Further refinement of this functionality is planned when applied to specific types of actions found in a package manifest, such as "driver" actions. But it will be handled by the package system transparently.

  Do you attempt to control for local configuration?
In what sense? pkg(5) allows an admin to force the creation and use of a clone boot environment as part of any package-modifying operation.

If you're talking about service configuration, that's handled via SMF (the service management facility) which has its own snapshot system that services use.

If you're talking about legacy service configuration that uses flat files on-disk, then package creators have the option of using service actuators to trigger a restart or refresh of the service whenever the configuration file changes.

You'd have to be more specific.

  Is the X server package reboot-needed=true?
No, because any files that the X server needs are already loaded into memory so updates to the on-disk files won't generally affect it. Any changes made to the X server binaries and libraries on-disk won't take effect until the X server is restarted.

An incompatible change in the X server itself (such as a protocol change, which is very rare) will likely come with a related kernel change which would require a reboot. But again, the package creator is in control of that.

Also, the "reboot-neededed" tag doesn't force a reboot per se; it just forces the update of a given component to be performed in a clone boot environment.

1 comments

> With that said, yes, I think OSTree needs filesystem snapshot support

I'm happy with the fact that it allows admins to choose whatever they want at the filesystem and block level, the same as dpkg/rpm do. For example in a cloud environment, block level redundancy may be more easily provided at the infrastructure level for guests.

For those who do need redundancy, you're free to choose BTRFS, XFS+LVM, or hardware raid, whatever suits you.

As for the rest of your reply around the reboot-needed flag; it basically sounds like it's fairly manual, but maybe that's "good enough". I have a particular paranoia about race conditions though, so were I to design a system that attempted to do live update applications, it'd be a whilelist, not a blacklist as reboot-needed effectively is.

  I'm happy with the fact that it allows admins to choose
  whatever they want at the filesystem and block level, the
  same as dpkg/rpm do. For example in a cloud environment,
  block level redundancy may be more easily provided at the
  infrastructure level for guests.
While it may provide admins with choice, it doesn't provide them with "robust OS upgrades". Again, that's my only quibble here.

  For those who do need redundancy, you're free to choos
  BTRFS, XFS+LVM, or hardware raid, whatever suits you.
There's a large variance in those solutions that doesn't provide the same end-user experience (or even result), and without integration with the package system, doesn't really provide the sort of safety net most administrators are looking for.

  As for the rest of your reply around the reboot-needed
  flag; it basically sounds like it's fairly manual, but
  maybe that's "good enough".
It's only "manual" in some sense for package creators; not administrators. For administrators, it's a transparent decision about what's safe to update and what's not.

As for actually being manual, not really. The package system provides package creators with a tool called pkgmogrify, which allows them to set rules that cause transformation of actions based on patterns at package publication time. As an example, Solaris has a set of rules that say that, by default, any files delivered to /kernel should be tagged with reboot-needed=true.

I won't claim it gets the OS 100% coverage, but as has been said before "perfect is the enemy of good."

  I have a particular paranoia about race conditions though,
  so were I to design a system that attempted to do live
  update applications, it'd be a whilelist, not a blacklist
  as reboot-needed effectively is.
While I understand the paranoia, a whitelist is not actually a practical option. The vast majority of content delivered onto a modern UNIX or UNIX-like system doesn't require a reboot when being updated. man pages, header files, and most userspace binaries and libraries can all be safely updated without a reboot.

For example, on my desktop workstation, ~289,806 items have been installed by the package system. Of those, only 857 have been determined to require a reboot if they are updated.

That means that roughly 0.37% (yes, < 1%) of the items on my system have been determined to require updates to be performed in a clone boot environment (require a reboot).

For enterprise-level customers, minimising the number of reboots needed to enact change is paramount. Any downtime at all can often cost them millions of dollars.