Hacker News new | ask | show | jobs
by subway 3673 days ago
Containerization in Linux is fugly. There is no core concept of containers in the kernel, you just have a set of loosely integrated namespaces abused by the likes of lx[cd] and docker.
2 comments

I don't share your opinion. The Kernel exposes a collection of primatives (including but not limited to: cgroups, namespaces, and copy-on-write storage[1]) which can be used to create isolated sandboxes. The kernel itself doesn't bind the primatives together because I believe Linus would consider that "User space"...and I would agree.

Instead this is left up to other tools like LXC. Also note, that higher level features such as network support are also left up to the higher level tool.

Docker and LXC have core differences in vision of what a container should be [2]. Also, Docker used to be based on LXC, but have since done their own library libcontainer which handles the interaction with the kernel primatives.

To me, Docker's philosophy and libcontainer implementation is...as you say, fugly, but LXC's approach and implementation is not.

I also don't think of the kernel exposing primatives and letting user space tools bind them together as inherently bad. I actually prefer it this way and think it leaves the kernel cleaner/leaner/better off.

[1] http://www.slideshare.net/jpetazzo/anatomy-of-a-container-na...

[2] https://www.flockport.com/lxc-vs-docker/

The original design wasn't intended to provide that kind of isolation, and the primitives that are exposed are retrofit; every new containerization design needs an audit that captures the entire exposed functionality of the Linux kernel.

You can just skim this paper to see the problems: non-namespaced identifiers leak in procfs, UID "slides" expose containers to each others resource limits, there are non-namespaced non-containerized kernel functions exposed to root inside of containers, and so on.

That's interesting...it was my impression that some of the kernel features were added specifically as a result of the kernel patches that were originally part of the OpenVZ project. Once the kernel adopted official primatives the original OpenVZ patches were deprecated. It was also at this time that LXC started with some of the same developers from the OpenVZ project.

I could be wrong...but that path dependency seems to indicate that while they were implemented as more general kernel features...one of their motivating use cases was container isolation.

Can anyone more informed clarify the history for me?

I'm not evaluating the container features in isolation. Considered by themselves, they might be perfectly coherent. The problem is that every feature of the kernel with a namespace of any sort needs to be aware of those container features, and namespaces leak into each other unexpectedly, because most of them are very old and were implemented long before anyone considered containerization.
To the best of my knowledge, the container features in the vanilla kernel today (cgroups, as used by LXC, docker, etc) originated at Google, where they were used more for resource allocation than for containerization per se. The kernel patches developed by Virtuozzo/Parallels for OpenVZ were never upstreamed, and were considerably different in design from cgroups.
They're talking about namespaces. Cgroups are not an isolation mechanism, and there have been significant rewrites of the core since Google worked on them. Most of the namespace work came from Odin (Parallels) as well as Virtuozzo and others.
Fugly? Compared to what alternatives? The offerings from Microsoft are even fuglier.
In illumos, a descendant of (Open)Solaris, we have a first class container primitive called "zones". In SmartOS, the Joyent-backed distribution of illumos, we also have support for running an entire Linux userland (e.g. Ubuntu or CentOS) in this substrate.

You can have the best of both worlds: a secure container substrate, designed from the ground up as a coherent whole like Jails; and the vast packaging ecosystem provided by Ubuntu.

FreeBSD jails, I would think.
See also: Solaris zones.
Yeah when that giant Oracle boat finally turns we are all in for it.
illumos Zones then.
Unless I've missed something (and I may have!), FreeBSD's jails have a very respectable security track record. Really, really want to make use of them.

I can't give up Debian's package system, though, so I'm left hoping that kFreeBSD will amount to something someday and I use Xen or KVM in the meantime... :-(

> I can't give up Debian's package system, though

Why not? What would you miss from it?

I run Debian Testing and FreeBSD 10. I haven't found too much from Debian that I can't get in FreeBSD 10. I could even run a Debian/kFreeBSD jail if I really wanted to.

What really does my head in is that a default Debian install can pull down 2 megabytes a second from a server over SFTP, and a default FreeBSD 10 server can only do ~800 kilobytes per second (FreeBSD 9 was worse).

> What really does my head in is that a default Debian install can pull down 2 megabytes a second from a server over SFTP, and a default FreeBSD 10 server can only do ~800 kilobytes per second

Shouldn't be that much of a difference. You might try OpenSSH from ports, maybe the HPN patches will help if you're on a high latency connection.

But do enterprise companies use FreeBSD jails regularly? AFAIK they're basically used as toys by developers.
Jails are used on the Playstation 4 and with some 36 Million PS4s sold so far that's a huge use of jails in production. Here's a quote from an article talking about it,

"We can prove the existence of FreeBSD jails being actively used in the PS4's kernel through the auditon system call being impossible to execute within a jailed environment"

This quote is from: https://cturt.github.io/ps4.html

It's obvious that your parent was talking about servers in production. Isn't this entire thread about that?