Hacker News new | ask | show | jobs
by diogomonicapt 3781 days ago
Disclaimer: I work for Docker

For the security enthusiasts out there, Docker 1.10 comes with some really cool Security focused additions. In particular:

- Seccomp filtering: you can now use bpf to filter exactly what system calls the processes inside of your containers can use.

- Default Seccomp Profile: Using the newly added Seccomp filtering capabilities we added a default Seccomp profile that will help keep reduce the surface exposed by your kernel. For example, last month's use-after-free vuln in join_session_keyring was blocked by our current default profile.

- User Namespaces: root inside of the container isn't root outside of the container (opt-in, for now).

- Authorization Plugins: you can now write plugins for allowing or denying API requests to the daemon. For example, you could block anyone from using --privileged.

- Content Addressed Images: The new manifest format in Docker 1.10 is a full Merkle DAG, and all the downloaded content is finally content addressable.

- Support for TUF Delegations: Docker now has support for read/write TUF delegations, and as soon as notary 0.2 comes out, you will be able to use delegations to provide signing capabilities to a team of developers with no shared keys.

These are just a few of the things we've been working on, and we think these are super cool.

Checkout more details here: http://blog.docker.com/2016/02/docker-engine-1-10-security/ or me know if you have any questions.

4 comments

It's "funny" yesterday RKT made the announcement of their version 1.0 (with emphasis on security) and today we have 2 news about Docker at the top of HN with your comment about security.
By the way, you can use DockerSlim [1] to auto-generate custom seccomp profiles (in addition to shrinking your image). They are already usable, but they can be improved. Any enhancements or ideas are appreciated.

[1] http://dockersl.im

Any idea on the priority of getting a container with working systemd ?

https://github.com/docker/docker/pull/5773 and https://github.com/docker/docker/issues/3629

Disclaimer: I work for SUSE, specifically on Docker and other container technologies.

Docker containers /in principle/ do work with systemd. They are implemented as transient units when you use --exec-opt native.cgroupdriver=systemd (in your daemon's cmdline). I've been working on getting this support much better (in runC and therefore in Docker), however systemd just has bad support for many of the new cgroups when creating transient units.

So really, Docker has systemd support. Systemd doesn't have decent support for all of the cgroup knobs that libcontainer needs (not to mention that systemd has no support for namespaces). I'd recommend pushing systemd to improve their transient unit knobs.

But I'd rather like to know why the standard cgroupfs driver doesn't fulfil your needs? The main issues we've had with systemd was that it seems to have a mind of it's own (it randomly swaps cgroups and has its own ideas about how things should be run).

im not sure if we are talking the same thing here. I'm talking about systemd inside a container (as pid 1). I think that's the part that's not working.

Every few days someone comes up with a new run script for docker (baseimage "my_init", etc). I personally use supervisord. Since systemd is already universal, might as well use that.

Somebody posted this yesterday - https://news.ycombinator.com/item?id=11019143

Im already running my containers on a debian host with systemd - so that is ok. Overlayfs is still causing some problems though.

> I'm talking about systemd inside a container (as pid 1). I think that's the part that's not working.

Ah sorry, I misunderstood. I'm not sure why you'd want to use systemd as your process manager in a container. systemd has a very "monolithic" view of the system and I'm not sure you gain much by using systemd over supervisord (I'd argue you lose simplicity for useless complexity).

> Overlayfs is still causing some problems though.

I've been looking into overlayfs and I really encourage you to not use it. There have been an endless stream of bugs that the Docker community has discovered in overlayfs, and as far as I can see the maintainer is not particularly responsive. There's also some other issues (not being POSIX complete) which are pretty much unresolvable without redesigning it.

Whoa... Thank you so much for pointing out the issue with overlays. There seems to be no real consensus on what should be used. Could you talk about what should be used?

Just FYI - we use Debian on Linode.

Devicemapper works (make sure you don't use loop devices) okay. Unfortunately it's slow to warm up, but it's probably the most tested storage driver (it's the default on a bunch of systems).

btrfs works pretty well and is quite a bit faster. It's the default for SLE and openSUSE (as well as other distros which use btrfs by default). I'd recommend it (but I can't remember if it requires you to use btrfs on your / partition, which might be an issue for you).

ZFS, while being an awesome filesystem, I doubt has had much testing under Docker, so I'd be wary about using it.

And I've already told you what I thought about overlay. I'd like to point out that it's a good filesystem for what it was designed for (persistent or transient livecds) but the hacks in Docker in order to use it for their layering keeps me up at night.

Whats advantage of having a "supervisor" inside the container, rather than just "supervising" the container itself?
Because "only one process in a container" is a dangerous rule (because it has so many exceptions). In certain cases, that idea makes sense, but you shouldn't contort your app such that you only have one process in every container. Not to mention that there are other issues (PID 1 has unique problems that databases and your app aren't used to having to deal with).
maybe you should read this [1]. We have always run all processes under a supervisor.

[1] https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zomb...

I think this was the problem [1] was targeting [1] http://engineeringblog.yelp.com/2016/01/dumb-init-an-init-fo...
I see, so it looks like if your process spawns other processes and doesn't reap them when they die, you end up in trouble with zombie processes in docker.
> The new manifest format in Docker 1.10 is a full Merkle DAG, and all the downloaded content is finally content addressable.

Can someone elaborate on this a bit more? From a CS point-of-view, sounds like a problem where a data structure came in handy but I'm not sure what it solves. Thanks!

A simple immutable data structure can be implemented with a Merkle DAG. Merkle tree leaves store hashes of previous nodes and DAGs are directed graphs that don't loop around. Examples include simple blockchains. These structure provides immutable, versioned control of information. Containers are immutable, or like to think they are at least, so blockchains are an obvious thing to use in conjunction with deployments of said containers. At least that's what I keep telling everyone.
Do you literally mean a proof-of-work backed blockchain (like Bitcoin), or something more like git, which has a similar structure to a blockchain without the consensus mechanism?

I don't see how the former would be useful to someone deploying containers, but interested to hear your thoughts in either case.

From the blogpost:

> Image IDs now represent the content that is inside an image, in a similar way to how Git commit hashes represent the content inside commits.

I was referring to the second part of the comment:

"Containers are immutable, or like to think they are at least, so blockchains are an obvious thing to use in conjunction with deployments of said containers. At least that's what I keep telling everyone."

I'm pretty sure they mean that they are using Merkle DAGs. A blockchain is a Merkle DAG. The proof-of-work algorithm in Bitcoin is an algorithm for deciding how a node gets added to the blockchain. Depending on how you look at it, that algorithm is not part of what makes it a "blockchain".

Admittedly people are sloppy about how they use the term "blockchain". I would prefer that people use the term Merkle DAG and forget the term "blockchain" altogether, but I think we are stuck with "blockchain" ;-)

Blockchains don't have to contain proof-of-work as long as the values in the chain itself aren't valued in and of themselves over longer time periods. In Bitcoin, a cryptocurrency built using a blockchain, the values represent debt owed to someone in exchange for a real world item, and that debt stays active for the life of the entry. There are a slew of proof-of-somethings that allow blockchains to become cryptocurrencies. I don't exactly ascribe to these ideas of value store as related to compute provisioning, but I suppose there could be some actions which might benefit from it, such as certain types of licensing.

That said, triggering provisioning using cryptocurrencies is likely to be a thing at some point.

Is a git repository one implementation of a Merkel DAG?