Hacker News new | ask | show | jobs
by cortesoft 1184 days ago
> In particular, the union file system (UFS) image format is a choice that seems more academically aspirational than practical. Sure, it has tidy properties in theory, but my experience has been that developers spend a lot more time working around it than working with it.

What is the alternative that is better? The ability to have layers that build on top of each other and can be cached is a big feature... what alternatives provide that and are better?

3 comments

IMO image definitions should be a list of mounts that may be overlays on root but may also be more “normal” mounts to directories within root. I should be able to make an image that is ubuntu:bionic plus a conda installation at /opt/conda plus a personal package at /usr/local/mything. Currently you have to decide on how to stack those layers, which is unnatural and prevent sharing/deduplication of partial-file system images where there’s no reason to prevent it.

Taken to the extreme, look at something like Nix (or conda, come to think of it). Why can’t I just have one copy of a package of a given version shared by all containers, if they all want that package? Unix file systems should be great at that kind of composibility; that’s the advantage of a unified tree instead of a tree-per-source. But in the docker model, you’re stuck with a stack.

My ideal image definition is a hybrid between docker’s immutable hash-addressed image layers and an fstab file to describe how and where to mount them all.

The POSIX standard requires certain behaviours from the filesystem, that POSIX-compliant software can rely on.

Unfortunately, those behaviours are mutually exclusive with transparent layering.

It's certainly possible to build a file-system whose behaviours are compatible with that kind of transparent layering - Plan9 was built on exactly that model, for example - but then it wouldn't be a POSIX-compliant filesystem anymore.

The promise of Docker was that you'd be able to deploy your existing applications in a more reliable, repeatable way, but that breaks down when you have to tinker with your application's file-handling code, or jump through extra hoops to flatten the layers of your container's filesystem image.

One should make a distinction between:

* The general idea of mixing together filesystems+folders to achieve re-use/sharing/caching.

* The "Dockerfile" approach to this - with its linear sequence of build-steps that map to a linear set of overlays (where each overlay depends on its predecessor).

The "Dockerfile" approach is pretty brilliant in a few ways. It's very learnable. You don't need to understand much in order to get some value. It's compatible many different distribution systems (apt-get, yum, npm, et al).

But although it's _compatible_ with many, I wouldn't say it's _particularly good_ for any one. Think of each distribution-system -- they all have a native cache mechanism and distribution infrastructure. For all of them, Dockerization makes the cache-efficacy worse. For decent caching, you have to apply some adhoc adaptations/compromises. (Your image-distribution infra also winds up as a duplicate of the underlying pkg-distribution infra.)

Here's an alternative that should do a better job of re-use/sharing/caching. It integrates the image-builder with the package-manager:

https://grahamc.com/blog/nix-and-layered-docker-images/

Of course, it trades-away the genericness of a "Dockerfile', and it no doubt required a lot of work to write. But if you compare it to the default behavior or to adhoc adaptations, this one should provide better cache-efficacy.

(All this is from POV of someone doing continuous-integration. If you're a downstream user who fetches 1-4 published image every year, then you're just downloading a big blob -- and the caching-layering stuff is kind of irrelevant.)