| I don't share your opinion. The Kernel exposes a collection of primatives (including but not limited to: cgroups, namespaces, and copy-on-write storage[1]) which can be used to create isolated sandboxes. The kernel itself doesn't bind the primatives together because I believe Linus would consider that "User space"...and I would agree. Instead this is left up to other tools like LXC. Also note, that higher level features such as network support are also left up to the higher level tool. Docker and LXC have core differences in vision of what a container should be [2]. Also, Docker used to be based on LXC, but have since done their own library libcontainer which handles the interaction with the kernel primatives. To me, Docker's philosophy and libcontainer implementation is...as you say, fugly, but LXC's approach and implementation is not. I also don't think of the kernel exposing primatives and letting user space tools bind them together as inherently bad. I actually prefer it this way and think it leaves the kernel cleaner/leaner/better off. [1] http://www.slideshare.net/jpetazzo/anatomy-of-a-container-na... [2] https://www.flockport.com/lxc-vs-docker/ |
You can just skim this paper to see the problems: non-namespaced identifiers leak in procfs, UID "slides" expose containers to each others resource limits, there are non-namespaced non-containerized kernel functions exposed to root inside of containers, and so on.