Hacker News new | ask | show | jobs
by KarlPlatt 3905 days ago
My experience with Ansible has not been so pleasant. Especially performance is a jobstopper. In my environment it takes 20 min for 12 Servers to be setup with some Redis, Elasticsearch stuff. Quite some become_user directives, but 20 min for this kind of stuff is just not acceptable. After all, application settings needs to be tuned and iterated over, too.

My idea was to develop the infrastructure with Ansible, e.g. no ssh to change some httpd settings at all. Everything via Ansible. It worked very well as long as the playbooks and number of servers was very small.

4 comments

This has been my experience as well. Even using a small subset of a playbook via tags can take a long time, especially if you're doing a run in serial. One of our deployments that only affects six servers takes fifteen minutes.

This can be mitigated somewhat by putting Ansible on the target machine, downloading all the necessary files to that machine, and then running Ansible locally... but that seems awfully fragile to me.

I am much more interested in Salt's ZeroMQ path these days. It seems to scale better, at least on paper and in my few small tests.

I'd be interested in hearing how this stacks up against simply running the tasks via shell scripts, because the time to install packages/do other tasks is orders of magnitude higher than the connection overhead. Things will always be slow when doing `serial: 1`, so I'd definitely recommend a canary setup where you run a play with a small serial batch followed by a play with no serial limitation - that'll speed things up considerably.

Finally, when using ControlPersist with pipelining mode in Ansible, it's as fast if not faster than zeromq or our own accelerated mode (which we will be deprecating at some future point when older SSH installs are not as common).

I'll admit to not knowing what ControlPersist is. I have some reading to do, it would seem.
If you're using Ansible for orchestration, you could try using the cloud's orchestration service instead. e.g. Rackspace Cloud Orchestration, AWS Cloudformation etc. In this specific case, you can use the orchestration api to spin up and manage the servers, and use ansible to manage the software (although there is a way to manage software as well [0]; I'm just not familiar enough with it to suggest it)

Disclaimer: I work in the Cloud Orchestration team at Rackspace.

[0]: https://github.com/openstack/heat-templates/tree/master/hot/...

Cloudformation is a shit-show. I wrote the boto_* modules in SaltStack to avoid using Cloudformation. It does magical shit like "oh, you wanted to change this one value? I'm going to rebuild entire portions of your cloud."
>After all, settings needs to be tuned and iterated over, too.

That's why it has tags. So you can run just the settings states rather than running the whole 20 minute thing over and over again.

You can just run portions of the playbooks, but then you lose the value of a descriptive infrastructure. What does X look like? Depends on when each tag was run.
It shouldn't unless you're very careless. A tag that just updates all of the settings files and restarts the services should have the same effect as the full playbook run.
And if you add a new machine to the cluster, which hasn't gotten all tags run against it? Or of a machine was temporarily offline when a tag was run, or...

There are many potential situations where not running a full inventory against a running machine results in a machine not being properly configured.

>And if you add a new machine to the cluster

Then you'll probably be running the whole playbook again, including those settings changes you made before.

>Or of a machine was temporarily offline when a tag was run

Then it'll fail and give you a warning. Shrug your shoulders and run it again?

With Tower I have dynamic inventories and variables. I can make it as descriptive as I need to and view everything in Tower.
Ansible 2.0 should have some new strategies to speed things up, depending on your requirements: https://docs.ansible.com/ansible/playbooks_strategies.html It will be interesting to see how performance is after it's released.

We eventually settled on having Ansible build an AMI for us that can then be spun up by as part of a Cloudformation template (also initiated by Ansible).

We've actually been moving further and further away from having Ansible handle the configuration management side of things, and deal with Orchestration primarily.

What do you use for configuration management?
We've moved to using Hashicorp's Consul-Template (https://github.com/hashicorp/consul-template). Ansible populates Consul with any required configuration changes during the deployment of a new version, and Consul-Template knows about these changes and automatically writes them to disk. Applications running on the host are then reloaded to pick up the changes.