Running 1000 containers in Docker Swarm

Y	Hacker News new \| ask \| show \| jobs

	Running 1000 containers in Docker Swarm (blog.codeship.com)
	80 points by titpetric 3382 days ago

7 comments

barhun 3382 days ago

2k nodes, 100k containers https://blog.online.net/2016/07/29/docker-swarm-an-analysis-...

link

schmichael 3382 days ago

5k nodes, 1 million containers, 5 minutes https://www.hashicorp.com/c1m/

(Disclaimer: I'm on the Nomad team but wasn't at the time of the post)

link

jacques_chester 3382 days ago

I don't know much about Nomad and couldn't work out from the repo what the jobs were. If I guess correctly, it's an app using Redis. Is that correct?

Disclosure: by coincidence of market forces, we're mortal enemies. Let's send christmas cards!

link

schmichael 3382 days ago

> it's an app using Redis. Is that correct?

Yup! Repo could definitely be clearer, but here's the code:

https://github.com/hashicorp/c1m/blob/master/schedbench/test...

Basically calls an increment in Redis and then blocks forever.

> Disclosure: by coincidence of market forces, we're mortal enemies. Let's send christmas cards!

Haha, hi mortal enemy! Christmas cards it is! If you're ever in Portland, OR I'll buy a beverage of your choice as well. :)

link

jacques_chester 3382 days ago

I extend the same offer in New York!

link

jacques_chester 3382 days ago

Since we're playing this game:

1.55k nodes, 250k containersed applications[0].

Mind you, it's hard to compare these as there's no real "cloud bench". For pure benchmark porn Nomad are the undisputed champs on their 1 million case.

The Cloud Foundry scaling test was intended to show a system with fully service-configured, fully-routed apps, with varying app characteristics (memory and RPS). To further stress the system, thousands of apps crashing and are relaunched on a continuous basis.

Cloud Foundry installations with >10k containers have been ordinary for a while now; the 250k thing was to ensure we had lots of headroom and shake out chokepoints in Diego.

[0] https://content.pivotal.io/blog/250k-containers-in-productio...

Disclosure: I work for Pivotal, the majority donor of engineering on Cloud Foundry.

link

titpetric 3382 days ago

Big respect for your achievements. I guess at some point it just becomes the question of "where do i get a 1000 nodes" vs. "how do I run a 1000 containers". Or, more the justification for that amount of hardware - I mean, the one dream job which I would probably want is getting paid to cut out all the hardware use while keeping reliability/availability/functionality. Like these guys who cut their AWS bill by $1mil/year in about 3 months - https://segment.com/blog/the-million-dollar-eng-problem/. The thing is that I'm not exactly sure where I'd fit in more - running this thing, or just fixing it for somebody else. I definitely know that I'm mostly dealing with pets and not cattle :)

link

jacques_chester 3382 days ago

Well, Cloud Foundry is deployed by BOSH. So you can, if you wish, use RackHD to deploy it to naked hardware (instead of OpenStack, GCP, AWS, Azure and I forget what else).

Your apps will still be containerised, distributed and wired up the same way.

There's always a point at which it makes engineering sense to flip the switch to doing it yourself. But that frontier is never static. We (plus our peers in the Cloud Foundry Foundation) and others in this space like Red Hat OpenShift are constantly pushing back the tipping point at which it makes economic sense to DIY.

We already have very large customers with very large engineering teams, who've built platforms before. And they are switching because that effort no longer makes business sense. It's an expense they don't need for a platform they're the only maintainers of.

One of our peers at IBM wrote about DIY[0]. We have our own much more markety-businessy whitepaper, with a very detailed case, on the same topic[1].

[0] https://hackernoon.com/stop-spending-engineering-effort-solv...

[1] https://content.pivotal.io/white-papers/the-upside-down-econ...

Disclosure: I work for Pivotal, etc.

link

peterwwillis 3382 days ago

I would use IPv6 for the orchestration network, probably not touch the tcp/ip parameters except for port range (and open file descriptor), and break up the broadcast domain into smaller networks. It is not advisable to have thousands of machines on one broadcast domain, and it is a pain in the ass to troubleshoot, not to mention causes bigger headaches when one network problem affects all the nodes across the entire gigantic network.

link

eblanshey 3382 days ago

Does anyone know how easy it is to set up autoscaling with Docker Swarm running on Google Cloud or AWS? We're looking to get starting with Docker Swarm or Kubernetes soon, and are considering using Docker Swarm because of its simplicity and developer familiarity with Docker Compose (we use it for our dev environment). We just want to add nodes to a cluster as traffic spikes and subsides.

link

kingrolo 3382 days ago

Google Container Engine supports cluster autoscaling to automatically add nodes with load. It's listed as a beta feature though.

I've tried most of the Docker orchestration offerings and Container Engine seems by far the nicest. Swarm and Compose are really simple for getting up and running, but when we evaluated them there was still a missing piece required in that there was no neat way to do zero downtime deployments.

There's a tool called Kompose to convert docker-compose config to kubernetes manifests (https://github.com/kubernetes-incubator/kompose) although whilst it's nice to get you started we tend to maintain them separately now.

link

KenCochrane 3382 days ago

Have you looked at Docker for AWS yet? https://www.docker.com/aws

It will setup your swarm, which uses auto scaling groups for the worker nodes. You can then configure the auto scaling groups how ever you want, to scale based on your cloudwatch metrics, etc.

There is also a Docker for GCP product in beta. https://beta.docker.com but I don't know how auto scaling works for it.

Disclaimer: I work at Docker on the Docker for AWS product.

link

013a 3382 days ago

I think most people would suggest that, if your use case is at a stage where that is important to you, Swarm is not the right thing. Kubernetes or ECS are better choices.

link

truetuna 3382 days ago

I wouldn't recommend ECS. I've used it for a little over 6 months and even for trivial things, it lacks. A couple examples that come to mind include not being able to pass host environment variables into your container instances easily, and not being able to specify that a service must run on all hosts.

Theres an open issue (made ~2 years ago) on GH for the 1st example and it still hasn't been resolved.

link

hefeweizen 3382 days ago

In the context of Docker Swarm and Kubernetes, autoscaling refers to container level scaling ie. given a set of nodes, any autoscaling function would manage the number of containers that are currently running on these nodes.

For instance/node level autoscaling (which is closer to what you need), I would recommend using the autoscaling features provided by AWS/Google Cloud.

link

eblanshey 3382 days ago

> I would recommend using the autoscaling features provided by AWS/Google Cloud

It would have to be integrated with Kubernetes though -- when we push a new docker container, the container would need to be updated on any new machines created. We'll look into GCP's autoscale solution.

link

dalailambda 3382 days ago

The node level autoscaling doesn't need to be integrated with kubernetes, all it needs to do is create a new instance and register it as a node through normal channels.

Even if you don't need autoscaling, I'd suggest still using autoscaling groups and setting it to a fixed number of instances, so that instances will automatically get restarted if they go down.

link

hefeweizen 3382 days ago

Yeah, any new machine instance has to join the Swarm (and its equivalent in kubernetes-speak). But that can be decoupled from kubernetes or docker swarm mode.

As for image management, it would depend on how you would like to propagate new images. With a private docker registry, you could potentially point each new instance to the registry and take care of propagating new images. I favor this approach since it keeps everything separate and easier to manage.

link

acejam 3382 days ago

I suggest looking into Amazon ECS. They have auto scaling features that can trigger based on container-level alerts and thresholds.

link

hefeweizen 3382 days ago

Slight nitpick, but this articles deals with "Docker Swarm mode" [1], which is different from Docker Swarm [2].

[1] https://docs.docker.com/engine/swarm/

[2] https://github.com/docker/swarm

[3] Difference between Docker Swarm and Swarm mode: http://stackoverflow.com/questions/40039031/what-is-the-diff...

link

xchaotic 3382 days ago

I always wonder, why not isolate on a process level, or even withing a single, multi-threaded app. Sure you can run some sort of web service on hundreds of docker containers or you can run a single, fast web server that scales?

link

acejam 3382 days ago

When that single web server goes down, it's not so "fast" anymore.

link

patrickg_zill 3382 days ago

Agreed, though I keep wanting to take the time to get VRRP working with a web server to have redundancy. OpenBSD uses this to coordinate stateful firewalls with 2 or more systems, if 1 goes down all state info is present on the second node which takes over.

link

titpetric 3382 days ago

Hi, OP author here: I have actually set up a VRRP (well, UCARP) on Docker, so it's possible even to containerize this facet of running a HA ops stack with Docker as the infrastructure. It is however, as you say, it is only used for one active node + a number of fail-overs in case that one goes down. In terms of maintenance (hosts do go down, scheduled downtime is common), it's priceless to have this part of the puzzle portable as well. If you want to check it out, there's a github available here: https://github.com/titpetric/ucarp-ha - and a future article with it is planned as well. It will also become a part of the E-book which I'm currently working on and publishing on leanpub: https://leanpub.com/12fa-docker-golang :)

link

patrickg_zill 3382 days ago

OK so that runs on the host to the Docker instances. Pretty cool!

link

undersuit 3382 days ago

That sounds like a fixable problem. I'm pretty sure Erlang programmers could give some tips.

Why is worrying about a single web server going down more worrisome than some part of the Docker stack going down and causing the same issue?

link

titpetric 3382 days ago

Actually, neither should be a problem if you have enough redundancy :) the hardest part of rolling your own infrastructure is testing mission critical systems (like databases) to be fault tolerant and at the same time reliable. Lots of great projects are out there that address some of these issues, but it takes a lot of attention to details (like transaction rates, ACID compliance, replication, etc.) to get it right. This is why a lot of developers which aren't in unicorn startups take advantage of technology which is available from giants like Amazon or Google, or specific problem-domain companies like CloudFlare for example. Netflix serves as a great example of a technology-driven company that is an inspiration to us, but there are so many others that really changed the way we approach problems - Tumblr, Etsy. But to stay on topic of netflix - I think their idea behind "chaos monkey" is great, and we're increasingly rolling out a (currently simple) docker swarm version of it - https://github.com/titpetric/docker-chaos-monkey - the best way to eliminate worry is to test failure scenarios. As docker chaos monkey is designed to unpredictably "kill off" containers, your system gets the benefit of design to handle failures. It's one of those problems that you have to have a passion for however - it's like testing software. You're only testing software for the functionality and failures which you can predict, and I'm pretty sure that any of us can't predict all the ways in which software (or distributed systems) can fail. As such, it's a never ending occupation. :)

link

dboreham 3382 days ago

Because that wouldn't allow you to own a unicorn-size startup providing tools and technology for containerization?

link

jacques_chester 3382 days ago

Sometimes you have endpoint A and endpoint B.

They are part of the same app.

They should not have the same level of privilege.

The secrets in endpoint A's memory should not be visible to endpoint B and vice versa.

Containers increase assurance that this is so.

link

collyw 3379 days ago

Databases have had table level privileges for decades. Not quite the same but its easy enough to use it for the same purpose.

link

jacques_chester 3377 days ago

I agree with you.

But if a single process has the single account on the database, how do you partition those permissions? Simply providing multiple logins won't help if you assume hostile code is in your process space.

On the other hand, if each service has its own login, then the database can enforce lowest authority for each. A compromise of one service isn't a game over scenario.

It's the difference between having a single account with the union of all permissions, or disjoint sets.

link

merb 3382 days ago

systemd can do the same thing. the only thing that docker adds or containers are immutability.

link

jacques_chester 3382 days ago

Cloud Foundry uses Garden which uses runC. But our Garden had a container system that predated docker and nspawn. So probably another case of Not Invented Yet Syndrome.

link

collyw 3381 days ago

Can someone that needs to run workloads like this explain to me why this is needed? It sounds like over engineering for the sake of it. There are only so many apps at Facebook scale in the world.

link

officelineback 3382 days ago

I'm still interested in how to merge features like AWS Autoscaling with Docker to right size the underlying infrastructure for the amount of container work going on.

link

wwarren 3382 days ago

https://www.docker.com/aws oughta be a good place to start

link