Hacker News new | ask | show | jobs
by iyn 4059 days ago
I've been using CoreOS & Docker for about 3 months now in production (stable channel on AWS). At the moment I have a "cluster" of 2 machines on AWS and 1 simple CI server on DigitalOcean, also on CoreOS & Docker. It wasn't easy for me to get used to "the docker way" of doing things, but I think I'm quite fluent in using Docker & building containers now. Setting up everything is very easy & productive, they have a great documentation (example: https://coreos.com/docs/running-coreos/cloud-providers/ec2/). If you're interested, you can setup a test VMs using Vagrant, this takes like 5 minutes: https://coreos.com/docs/running-coreos/platforms/vagrant/

However, I don't really feel comfortable with Docker security and I will probably switch to rkt - more focus on security and better approach to containers imo. CoreOS is incredibly good product, these people see the future. Full disclosure: I'm very happy user of CoreOS products.

3 comments

I'm in a similar place with regards to Docker versus Rocket, but I'm not a big fan of CoreOS right now. I think it's got some real neat ideas, but in my experience with it "everything in a container" is as yet unmanageable at scale. And I'm very uncomfortable with signing over the entirety of my infrastructure to a small company whose--venture-backed, which makes me extremely uncomfortable when it comes to running my platform and infrastructure--goals are not clear to me. (I have less of a beef with, say, Chef, with much more of a history and more generally understood goals and a business model, but I'm still not super comfortable with them, either.)

I also don't much care for etcd, because IMO Zookeeper's rep for complexity is hugely overblown and most folks end up re-implementing Zookeeper poorly in etcd, but that's a side thing.

All that said, I agree with you that Rocket is a much, much better idea and design, and that despite my misgivings about their corporate goals CoreOS is a way more serious project from a security standpoint than Docker. I'm excited to see this, if only because I think Rocket will pick up some dV from this.

I believe that the current state of containers is just the beginning of the trend and containers development. IMO, containerization simplifies security, fast deployment cycles, dependency problems and improves some other things. There are problems, of course and things that are not solved in a containers world. The obvious thing is a persistent storage - we don't have a good solution to this that uses containers. There are "patterns" (like, for example: running specialized database container and sharing "volume" with the host), but it's just a hacky trick and not a solution. I really hope it'll be solved. Using AWS RDS (Amazon Database in the Cloud) and S3 is kind of a "workaround" (pretty good workaround, to be honest), but not everybody wants to use AWS.

I really like Ansible and I believed that Ansible could help with Docker deployments (http://www.ansible.com/docker), but at the moment I think that using Dockerfiles is faster and simpler. I'm still using Ansible for Vagrant provisioning, but don't need it for setting up production/ci anymore. Sure, Ansible/Chef/Puppet will be probably on the market and used by some big players, but I'd be very surprised if we won't see big shift into containers for most of the technology companies. Being faster, more productive and cost effectiveness (related: https://www.youtube.com/watch?v=xK0njkATf84) are good enough arguments imo.

Just anecdotally, I've had the shittiest experience in recent memory with CoreOS. Their choice of BTRFS bites me pretty much daily whenever logs start writing stack traces for errors. A similar problem happened when Deis chose CephFS for their registry container.

While I'm a huge fan of Docker I will maintain that CoreOS is not ready for production in most capacities and recommend against using it in its current form.

Can you elaborate on "whenever logs start writing stack traces for errors."? What caused the problem? And what exactly was the problem with Deis & CephFS? I was considering using those some time ago.
Sure. Whenever the log messages start to get pretty decent in size, like, when the log messages start including tracebacks due to errors, and you have lots of errors happening back to back, like lets say high traffic is causing some deadlock exception at the DB layer or something(this was our actual problem), this high write causes BTRFS to lock up. Usually this happens because of some kind of kernel level error, our ops guy is familiar with the exact details of the exception BTRFS throws. Suffice it to say, when that happens, it's not immediately obvious the instance is unavailable. If you have plenty of errors happening in a short time span, this behavior will start to roll across your cluster and depending if all of the hosts are running the offending container, your entire fleet of hosts will lock up. This sucks bad. The only real solution is to bounce the host machines -- though they seem to come back just fine.

CephFS had a similar-ish problem, where if you happened to ever own less than 3 nodes (almost always because of the above), it would have a real problem self-healing and start to get confused and re-elect bad nodes to quorum leadership, possibly clobbering all your registry data. We contributed a patch back to Deis to use S3 as a registry persistence layer because, well, having a volatile registry sucked. We were about to develop a control layer for quorum services to live on separate of the application services we were developing, and using a proxy in the quorum layer to communicate stuff like etcd. I would highly recommend this approach if you end up with any services that require a quorum.

Thanks for the reply.

I would prefer to avoid many disk operations in the first place. Not sure, if it's applies to your problem, but have you thought about using something like Sentry (https://getsentry.com/ || https://github.com/getsentry/sentry)? Maybe there is some other tool that could help you here, I'm often impressed by the Open Source community and the vast number of different packages for (almost ;)) every technological problem.

FYI they switched from btrfs a while back. I think you need to reinstall with a newer version to get it though, it won't change on upgrade.
What security differences do you see now?
Do you mean the differences to the way I was setting up servers before CoreOS? Well, this is subjective, of course, but I feel that it's easier & faster to isolate everything now. I still have to add iptables rules (I also use AWS ports restrictions) and add SSH keys, but that's all I have to do. Auto updates with scheduled reboots and safe rollback work out of the box, I have only few vector attacks in terms of services exposed to the public network. When I have everything in 1 data center, "internal" tools (etcd, fleet, confd, locksmithd) are open only in private network and just ports 22, 80 & 443 have to be exposed. Containers are also "linked" in private network, I can easily securely connect services/applications in containers running on different machines without much overhead.

Often the easiest attack vector is/are security vulnerabilities in applications exposed to the public network. Containers are great here, because when application is compromised, just 1 particular container is dead (and probably other containers running that app). I can just run docker kill app && docker rm app and the rest of my containers are (probably) ok. The problem is that attacker can gain access to data in etcd, since it's not encrypted by default and has no per-user permission (but you can use HTTPS in etcd cluster, which is good), however as of now you can use something like crypt: https://github.com/xordataexchange/crypt to use gpg in etcd (with natural API).

Docker gives you root for everything in the container, which may not be the best option. Also, only lately Docker have the option to verify downloaded images. rkt already has this feature, and it's just 0.5.5. (see also: https://github.com/coreos/etcd/blob/master/Documentation/sec...). Docker is great, don't get me wrong, I think it's a very good software and I'm glad I can use it. CoreOS team has just a better vision and priorities, IMO.

I'm considering a blog post about CoreOS/Docker/rkt and similar tools, not sure if there's interest?

Plenty of interest. Containerization is still a new paradigm and a lot of us are just getting started with it. I'd love to read your experiences.
I'll submit something to HN within a month, then. For now I can recommend this great guide: https://www.goettner.net/2015/coreos-and-docker-on-aws-revis...