Hacker News new | ask | show | jobs
by WestCoastJustin 4392 days ago
Google has plenty of experience with containers already, since they heavily use cgroups and the concept of containers in their production environment for isolation and resource control, never mind the fact that two of their engineers had written much of the initial cgroups code. I talked about this in my "Introduction to Containers on Linux using LXC" screencast [1]. Briefly, Google is in the process of open sourcing their internal container code [2], there was a Wired article that talked about their container orchestration system [3], and finally, there was John Wilkes (Google Cluster Management), who talks about the container management system [4].

Docker adds very interesting filesystem ideas and software for the management of container images. Personally, I think we are on the cusp of a transition from VPS (xen/hvm) to VPS (containers). I also hope that Google throws some of their concepts at the Docker project. Interesting times for this space.

[1] http://sysadmincasts.com/episodes/24-introduction-to-contain...

[2] https://github.com/google/lmctfy

[3] http://www.wired.com/2013/03/google-borg-twitter-mesos/all/

[4] http://www.youtube.com/watch?v=0ZFMlO98Jkc

6 comments

"Personally, I think we are on the cusp of a transition from VPS (xen/hvm) to VPS (containers)."

A transition back to, I think ... the very first VPS provider (JohnCompanies, 2001)[1] was based entirely on FreeBSD jail and was entirely container based, even though those containers were presented as standalone FreeBSD systems.

[1] Yes, verio did have that odd VPS-like enterprise service earlier than that, but JC did the first VPS as we came to know it.

Pardon my ignorance but don't containers share the same kernel as the host? Meaning I can't run a Ubuntu container in a BSD jail or vice-versa? I don't want to use containers if it limits my OS choice to being from the same family as the host.
> Personally, I think we are on the cusp of a transition from VPS (xen/hvm) to VPS (containers).

I'm not so sure of that. I think a lot of the use-cases for VMs are based on isolation between users and making sure everybody gets a fair slice. Something like docker would work well with a single tenant but for multi-tenant usage docker would give you all the headaches of a shared host and very little of the benefits of a VM. For those use cases you're probably going to see multiple docker instances for a single tenant riding on top of a VM.

The likes of Heroku, AWS, Google etc will likely use docker or something very much like it as a basic unit to talk to their customers, but underneath it they'll (hopefully) be walling off tenants with VMs first. VMs don't have to play friendly with each other, docker containers likely will have to behave nicely if they're not to monopolize the underlying machine.

I want option 3. A 4U rack with 32 completely isolated embedded stand alone quad core ARM or PPC systems, a network switch and an FPGA on each connected to the switch fabric.

Then we can start doing some interesting stuff past finding new ways to chop computers up.

Very interesting, that would be something I'd buy just to mess around with, I can think of a few ways in which I'd use it right off the bat and if you give me couple of hours more I'll have a whole raft of them :)
That does not sound very high density compared to what you can get from a company like Baserock - http://www.baserock.com/servers
I want a hefty FPGA attached to the CPU bus and switch backplane. That will take a lot more power than the ARM core.
32 ARM chips in 4U seems very low to me, just in terms of the TDP a 4U rack is able to dissipate at present. You could increase density a lot.
You could but I want standard storage per node (PCI-E FLASH), redundant PSUs and the TDP of a hefty FPGA going flat out is a lot larger than that of the ARM core.
> I think a lot of the use-cases for VMs are based on isolation between users and making sure everybody gets a fair slice.

Containers do this.

Hm. We'll see about that. I can see a whole pile of potential issues here with 'breaking out of the docker' on par with escaping from the sandbox and breaking the chroot jail, which I see this as a luxury version of.

Of course you could try to escalate from a VM to the host (see cloudburst) but that's a rarity.

Docker seems to be less well protected against that sort of thing, but I'm nowhere near qualified to make that evaluation so I'll stick to 'seems' for now. It looks like the jump is a smaller one than from a VM.

Fair usage of resources and security isolation are two VERY different problems. Containers can be VERY good at resource isolation. Security has not really been figured out yet.
This isn't really a "we'll see" issue. It is a fact that containers do resource isolation. :P The security issues are orthogonal.
Containers don't isolate very well. One thing that is easy to do is to make the system do disk output on your behalf just by making lots of dirty pages, or make the system use lots of memory on your behalf due to network activity. And of course there are the usual problems that you already have with VMs such as poor cache occupancy.

Shared hosting of random antagonistic processes is something that many developers are not quite ready to embrace. If you are willing to run your service with poor isolation and questionable security then containers are just the thing. You'll definitely spend less money if you can serve in such an environment.

I beg to differ. If you manage to break out of a container then all the resources of the machine are at your disposal.

So they're orthogonal only as long as the security assumptions hold.

I don't know where this myth came from that you NEED VMs for fair slicing. The Linux (and most other OS kernels) have been doing fair slicing just fine for years. I think the disadvantage of containerization is similar to those of OpenVZ VPSes: you can't partition your harddisk and you can't add swap space.
It's not a myth. A VM is effectively a slice of your computer that you can pre-allocate in such a way that that VM can not exceed its boundaries (in theory of course, this works perfectly, in practice not always).

So all other things being equal, if you slice up your machine into 5 equally apportioned segments and you run a user process in one of those 5 slices that tries to hog the whole machine it will only manage to create 1/5th of the load that it would be able to create if it were running directly on the guest OS.

So yes, linux does 'fair slicing' if you can live with the fact that a single process will determine what is fair and what is not. That that process gets pre-empted and that other processes get to run as well does not mean the machine is not now 100% loaded.

Using quota for disk space, 'nice', per-process limits for memory, chroot jails for isolation and so on you can achieve much the same effect but a VM is so much easier to allocate. It does have significant overhead and of course it has (what doesn't) it's own set of issues but resource allocation is actually one of the stronger points of VMs.

Well yes, but kvm is a vm thats just using Linux to do this vm scheduling. The main issue is that the API for containers is less well defined (IO scheduling is not necessarily fully fair with VMs, but its mainly aio on the host side at least).
You can add swap space, and there is even swap space acounting support in the kernel. Personally I don't use swap, I just buy fat amounts of RAM and allocate them to diskless worker-nodes in my clusters. As for partitioning, manual partitioning can give a slight speed advantage (if you know which filesystem you want to use, you have a long enough lived job to justify optimization, etc.), but generally you can just use http://zfsonlinux.org/ or at least LVM2 to avoid the segregation requirement entirely. In the former (ZFS) case you get arbitrary-depth COW snapshots, dynamic reallocation, transparent compression, and other types of useful options for ~free, as well. In the latter (LVM2 LV) case you get single-depth snapshots (though in theory this is improving; eg. via thin provisioning) but no dynamic resizing support (AFAIK, unless you use nonstandard filesystems).
Great points. Btw, Victor and Rohit from the Google LMCTFY team are now active maintainers of libcontainer. https://github.com/dotcloud/docker/blob/master/pkg/libcontai...

Also check out Joe Beda's deck from GlueCon: http://slides.eightypercent.net/GlueCon%202014%20-%20Contain...

Docker is a natural fit for GCE.

Great links -- thanks!
I'm always surprised that OpenVZ[1] doesn't come up more in discussions about containers. I used to use it extensively in for small lightweight VPS's (usually just an apache or mysql instance) and always found it to be pretty impressive. I've used Docker to make a debian-packaged app available on a bunch of CentOS machines and it saved me a huge headache (the dependency tree for the app was huge) so I'm a fan - but still a little puzzled at OpenVZ's seeming invisibility.

[1] http://openvz.org/Main_Page

OpenVZ was basically the prototype for LXC. Distros seem to have better support for LXC since it's "official".
Yeah, I realise it's place in LXC history. It still seems slightly odd that it's been kind of overlooked. It offered quite a lot (and still does) that I don't see replicated in any of the other container packages. At least not without quite a lot of manual faff.

Possibly it was just a little ahead of it's time and was also overshadowed by the rise of HW virtualisation in the later 2000's. Having to install a custom kernel (certainly when I used it) was also a bit of a hassle mind you. Anyway - maybe someone will re-invent the toolchain using Swift or Node and it'll become cool again ;-)

It also has a security focus, while docker has started with convenience for deployment rather than looking like an isolated machine.
Don't forget Linux vServers as well.
It's probably because of the way OpenVZ is marketed (or should I say, not marketed). OpenVZ's technology could probably do the same as what Docker does but they're not marketing it in the same way. The concept matters just as much as the actual technology.
I guess. Having a commercial version with a different name to push can't have helped the branding either.
There's also the recent article about automatic and machine-learning based power and core management.[1], [2]

If anyone here specializes in similar things, I would be curious to know if this Pegasus system runs on top of or underneath Borg/Omega (or perhaps replaced it?), or is a separate system altogether.

[1] http://www.theregister.co.uk/2014/06/05/google_pegasus_syste...

Edit: [2] http://gigaom.com/2014/05/28/google-is-harnessing-machine-le...

> Personally, I think we are on the cusp of a transition from VPS (xen/hvm) to VPS (containers).

There may be some of that, but I think more common will be continuing to have tradition IaaS bring-your-OS-image services, with a new tier in between IaaS and PaaS (call it CaaS -- Container host as a service), plus a common use of IaaS being to deploy an OS that works largely as a container host (something like CoreOS).