Hacker News new | ask | show | jobs
by davidstrauss 4751 days ago
> Citation needed.

First, I define containers by their capabilities, not a vendor or kernel calling them "containers." The Linux kernel, internally, has no concept of containers; it merely provides the resource-sharing and security-isolation building blocks to implement containers. It's userland utilities like LXC that assemble the necessary blocks together to provide what administrators recognize as containers.

So, let's talk non-virtualization-based resource- and security-isolation.

Workload Manager (WLM) [1] has been a part of IBM z/OS since before it was even called z/OS. WLM implemented something like cgroups scheduling, in that existing utilization samples feed into future scheduling to ensure responsiveness and fair access to resources under contention.

The Resource Access Control Facility (RACF) has mapped allowed program access to resources since 1985 [2], which functions similarly to modern mandatory access control.

> I'd go as far as to say that there is no such thing as a mature container technology. The fundamental problem of containers is that mainstreams kernels are not designed to be multi-tenant.

It's your turn to provide some citations. You could argue that the Linux kernel has failed to achieve good multi-tenancy support, but saying it hasn't been designed for multi-tenancy just makes you look ignorant of the last five years of development on namespaces, cgroups, and general multiprocess resource scheduling.

> They support multiple users quite well but when you try to have multiple root users, badness ensues.

Containers aren't about having multiple root users, even though that's technically possible if you opt to use Linux UID namespaces, which aren't mandatory.

> I'm quite sure it's not faster than virtualization.

Citation needed, especially since you're arguing that more layers of abstraction (kernel + hypervisor + kernel) is at or above the same speed of fewer (kernel + cgroups/namespaces).

[1] http://en.wikipedia.org/wiki/Workload_Manager [2] http://www-03.ibm.com/systems/z/os/zos/features/racf/racfhis...

1 comments

> Workload Manager (WLM) [1] has been a part of IBM z/OS since before it was even called z/OS. WLM implemented something like cgroups scheduling, in that existing utilization samples feed into future scheduling to ensure responsiveness and fair access to resources under contention.

I am intimately familiar with eWLM and I think it's quite unfair to call it containers.

> It's your turn to provide some citations. You could argue that the Linux kernel has failed to achieve good multi-tenancy support, but saying it hasn't been designed for multi-tenancy just makes you look ignorant of the last five years of development on namespaces, cgroups, and general multiprocess resource scheduling.

I'll give you a simple example. Do dd if=/dev/sda of=foo.img (or even /dev/null) in one container and measure I/O performance in the other.

The page cache is a global resource and so far there is not way to isolate it within containers. Buffered I/O is pretty fundamental to all workloads.

>> I'm quite sure it's not faster than virtualization. > > Citation needed, especially since you're arguing that more layers of abstraction (kernel + hypervisor + kernel) is at or above the same speed of fewer (kernel + cgroups/namespaces).

cgroups aren't free. See https://www.berrange.com/posts/2013/05/13/a-new-configurable...

I already cited a widely published benchmark (SpecVIRT). You article cites made up numbers (5-10 minutes to provision a guest).

$ qemu-img create -f qcow2 -b template.img new-guest.img $ qemu -hda new-guest.img -enable-kvm

And I have a nearly instant guest under KVM. So I don't know where your 5-10 minute number comes from but it's clearly not correct.

Note that I'm not normally one to bad talk anything. Containers are fine for that sort of thing. But you're claiming that containers obsolete virtualization and that's just plain silly.

I expect better from you :-)

> I am intimately familiar with eWLM and I think it's quite unfair to call it containers.

I feel like you're strawmanning me, here. I specifically avoided calling WLM "containers." I said they're building blocks that, combined with MAC-style service isolation (which also existed on 1980s mainframes), allow containerized isolation of consolidated services on a single OS image. It also wouldn't be fair to directly call cgroups "containers," though they provide similar building blocks for what I'm sure we agree are "containers" once configured by a tool like LXC.

I compared WLM to cgroups because both track utilization over a window of time and manage scheduling in a way that avoids starvation while enforcing other priority or fairness goals. The documented resource limitation and contention-sharing configurations for WLM [1] are undeniably similar to cgroups.

> I'll give you a simple example. Do dd if=/dev/sda of=foo.img (or even /dev/null) in one container and measure I/O performance in the other.

Linux kernel cgroups have supported block I/O limits and weights for fair sharing under contention since 2008. I'll give you the benefit of the doubt here, though, and assume you're actually trying to provide an example of breaking the page cache because you mention that immediately afterward.

> The page cache is a global resource and so far there is not way to isolate it within containers. Buffered I/O is pretty fundamental to all workloads.

Shared page caching among containers is also one of the primary efficiency gains when multiple containers access the same on-disk assets. All along the spectrum, there are trade-offs between optimization via sharing (which risks decreased predictability) versus isolated resources (which necessitate efficiency losses through redundancy). I don't suggest containers are special in this regard.

Regardless, this same issue exists for virtualization, especially once you get down to areas like the caches on RAID controllers or other buffering done on the host machine. Isolating resources and security always has overhead.

> cgroups aren't free. See https://www.berrange.com/posts/2013/05/13/a-new-configurable....

Of course they're not free, but your suggestion that they're heavier weight than hypervisors running kernels is both counter-intuitive and unexplained/undocumented in your post.

> I already cited a widely published benchmark (SpecVIRT).

There wasn't anything you "cited" about the benchmark other than its existence because you didn't provide any numbers resulting from benchmark to support your arguments.

So, your post is at the stage of hypothesis: you have a prediction and a way to falsify it. Please don't pretend the results have already come out in your favor.

> You article cites made up numbers (5-10 minutes to provision a guest).

The numbers in the article are based on wall time for provisioning servers from public images on the Rackspace Cloud and EC2 using the API. Your qemu-img example is contrived because real-world systems don't keep local images sitting around on the host machines to provision new instances, disallowing the twin advantages of local, high-bandwidth I/O and copy-on-write.

In contrast, container creation that shares existing host machine libraries and binaries is very fast, including in real-world deployments (like PaaS providers).

> Note that I'm not normally one to bad talk anything. Containers are fine for that sort of thing. But you're claiming that containers obsolete virtualization and that's just plain silly.

But you are strawmanning me, again. I never said "containers obsolete virtualization." I said that containers will be "the future of the cloud," meaning that they should come to dominate given how well they're starting to fit most cloud users' needs. That doesn't imply obsolescence any more than saying "ATMs are the future of consumer banking" implies that tellers are obsolete and have no role in any sort of transaction.

> I expect better from you :-)

Well, I'd hope so, given our time in college together. I just realized this, by the way. :-)

[1] http://pic.dhe.ibm.com/infocenter/aix/v7r1/index.jsp?topic=%...

I just timed a spin-up of a 16GB Fedora 18 instance on in the current-generation Rackspace Cloud's DFW data center. It took 7 minutes and 10 seconds to complete. Launching a somewhat larger-in-RAM instance on EC2 in Oregon took over 10 minutes; I stopped timing. In my experience running an infrastructure that's gone through thousands of cloud VM launches, those numbers are typical.