Hacker News new | ask | show | jobs
by xorcist 4076 days ago
Isn't the point of running an application in a container, or any chrooted environment, to only isolate the application from the rest of the operating system?

Then why would you start out with a complete extra operating system in there? Why not just put the application and its dependencies in there?

To strip non-dependencies from an complete operating system sounds like a very failure prone way to accomplish almost the same thing. You really need to execute all code paths, which is difficult to guarantee (did you really run your application in all locales for example?).

3 comments

> Then why would you start out with a complete extra operating system in there? Why not just put the application and its dependencies in there?

Packaging is hard. Let's go shopping!

A layered file system already does this.
Any Unix-ish application (i.e. one that shells out to do something at some point) will have a package dependency tree that ends up transitively closing over the "base"/"essential" package-set of the OS. "Dependency" has three meanings, to a packaging system, even though at run-time only one of them is relevant. There are:

1. "run-time dependencies" — package B needs package A installed because a binary from B actually makes use of a file from A when it runs.

2. "install-time dependencies" — package B needs package A installed because B is effectively a "plugin" for A. B is theoretically useless to the OS, except when used in the context of a sane A-like environment. This usually also implies that B, when installing itself, will run a script provided by A, usually to register itself in a database that A owns. This doesn't at all imply, though, that you couldn't just directly call the binary contained in the A package for a useful effect.

3. "asynchronous/maintenance-time dependencies" — package B needs package A because B does something to increase the system's entropy, and is written to assume that the system will compensate for this by having A running.

Docker images really only need type-1 dependencies, but as you dig toward the core of a package dependency graph, you start to see a lot more of type-2 and type-3 dependencies. If you execute a "debootstrap --variant=minbase", pretty much everything in there is there for type-2 or type-3 reasons.

A Docker container doesn't need to be a maintainable or autonomous OS distribution. It doesn't need grub, it doesn't need mkfs or fsck, it doesn't need mkinitramfs or the HAL hwdb; it doesn't need localegen, or debconf, or even apt itself. It needs to be a baked, static collection of files related to the application's run-time needs. But there's no demand you can make of apt or yum or even debootstrap that will spit out such a thing.

There was a project somewhat in this vein a long time ago, for embedded systems, called "Emdebian Baked"[1]. It was a misstep, I think, because it focused on creating variants of packages and a secondary dependency graph; rather than being a transformation one could apply to existing packages and the existing graph.

I've worked on and off on creating a transformation tool—effectively, a combination of a dependency graph "patch" that contains empty virtual-packages for many essential-package dependencies, a file filter/blacklist, and a final package whose installation burns away the whole package-management infrastructure from the chroot this is executing in. I haven't been happy with any of the results yet, though. Would anyone be interested in collaborating on such a thing as an open-source project?

[1] http://www.emdebian.org/baked/

Nix helps with this. There is an optimisation pass that can create hardlinks between similar packages/files? There was recent talk on package deduplication. Also every package directly specifies every dependency. However it currently won't help to remove unused files in each package, that violates the immutable hashes. The solution is to create more granular packages or to leave the immutability zone and into the mutable world if you have embedded scenarios.
I beg to differ, but we can probably compare data points until the cows come home.

Anyhow, even a large-ish application such as Oracle or a control system doesn't actually use ping or dd or troff, or most parts of what a modern unix-OS is comprised of. Most things suid are usually unnecessary, which if nothing else does decrease the attack surface.

Most web apps probably needs nothing unix-ish at all. A chrooted PHP app mounted noexec makes me sleep better than one running in a complete operating system. And most server side Java apps re-invents everything unix anyway, from mail processing to cron jobs, so they generally don't shell out as often as you'd think.

So I would argue it's actually pretty common that your applications have a limited set of dependencies. Especially compared to the hundreds of packages in any minimal modern unix install.

I agree that it's common, but it's not common enough to make this into a helpful property if you're trying to define a 100% solution. The reason Docker exists at all, apart from just nsexec(1)ing static binaries, is that a lot of things do need an environment—not of other Unix binaries per se, but of library assets like locales, charmaps, keymaps, geoip mappings, etc.—and then these asset packages think they're there to provide assets for maintenance-time functionality of a computer rather than to provide run-time functionality to an app in a container, so they pull in utilities related to themselves, which pulls in the base system.

If you can manage to get a working install of Postgres without pulling in half of Debian, I would be surprised.

But yes, on the other hand, it's perfectly possible to package some things, like the JVM, in a sort of "spread-out in a directory but equivalent to static-linked" fashion. The sort of things you see telling you up "unzip them into /opt/thispkg" because they don't really follow any Unix idioms at all, tend to be surprisingly container-friendly. They come from a world where binaries are expected to be portable across systems with different versions of OS libraries available, rather than a world where each app gets to ask the OS to install whatever OS library versions it requires.

Postgres is actually a good counter example to your point. It is a self-contained application that doesn't shell out. It doesn't need to access any of the things you mention, including charmaps, keymaps and geoip mappings.

I regularly run it chrooted without problems. You do need to understand you use case however. Things like external database utilities and backup scripts differ in requirements. Some of them are run outside the chroot, some don't.

It's absolutely not complicated, and if you have the faintest idea what you're doing it's much easier to get right than the fanotify dance described above.

And a complete operating system in a chroot would sit mostly unused, and only increase the attack surface for no reason at all. So, why?

> If you can manage to get a working install of Postgres without pulling in half of Debian, I would be surprised.

You mean like in this blog-post: https://blog.docker.com/2013/06/create-light-weight-docker-c...

It's not only about the isolation, but also reproducibility across time and portability across machines.