| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by KarlPlatt 3905 days ago
	My experience with Ansible has not been so pleasant. Especially performance is a jobstopper. In my environment it takes 20 min for 12 Servers to be setup with some Redis, Elasticsearch stuff. Quite some become_user directives, but 20 min for this kind of stuff is just not acceptable. After all, application settings needs to be tuned and iterated over, too. My idea was to develop the infrastructure with Ansible, e.g. no ssh to change some httpd settings at all. Everything via Ansible. It worked very well as long as the playbooks and number of servers was very small.

4 comments

bovermyer 3904 days ago

This has been my experience as well. Even using a small subset of a playbook via tags can take a long time, especially if you're doing a run in serial. One of our deployments that only affects six servers takes fifteen minutes.

This can be mitigated somewhat by putting Ansible on the target machine, downloading all the necessary files to that machine, and then running Ansible locally... but that seems awfully fragile to me.

I am much more interested in Salt's ZeroMQ path these days. It seems to scale better, at least on paper and in my few small tests.

jalons 3904 days ago

Saltstack is moving away from ZeroMQ. https://docs.saltstack.com/en/latest/topics/releases/2015.8....

jimi_c 3904 days ago

I'd be interested in hearing how this stacks up against simply running the tasks via shell scripts, because the time to install packages/do other tasks is orders of magnitude higher than the connection overhead. Things will always be slow when doing `serial: 1`, so I'd definitely recommend a canary setup where you run a play with a small serial batch followed by a play with no serial limitation - that'll speed things up considerably.

Finally, when using ControlPersist with pipelining mode in Ansible, it's as fast if not faster than zeromq or our own accelerated mode (which we will be deprecating at some future point when older SSH installs are not as common).

bovermyer 3904 days ago

I'll admit to not knowing what ControlPersist is. I have some reading to do, it would seem.

pm90 3904 days ago

If you're using Ansible for orchestration, you could try using the cloud's orchestration service instead. e.g. Rackspace Cloud Orchestration, AWS Cloudformation etc. In this specific case, you can use the orchestration api to spin up and manage the servers, and use ansible to manage the software (although there is a way to manage software as well [0]; I'm just not familiar enough with it to suggest it)

Disclaimer: I work in the Cloud Orchestration team at Rackspace.

[0]: https://github.com/openstack/heat-templates/tree/master/hot/...

ryan_lane 3904 days ago

Cloudformation is a shit-show. I wrote the boto_* modules in SaltStack to avoid using Cloudformation. It does magical shit like "oh, you wanted to change this one value? I'm going to rebuild entire portions of your cloud."

crdoconnor 3905 days ago

>After all, settings needs to be tuned and iterated over, too.

That's why it has tags. So you can run just the settings states rather than running the whole 20 minute thing over and over again.

falcolas 3905 days ago

You can just run portions of the playbooks, but then you lose the value of a descriptive infrastructure. What does X look like? Depends on when each tag was run.

crdoconnor 3904 days ago

It shouldn't unless you're very careless. A tag that just updates all of the settings files and restarts the services should have the same effect as the full playbook run.

falcolas 3904 days ago

And if you add a new machine to the cluster, which hasn't gotten all tags run against it? Or of a machine was temporarily offline when a tag was run, or...

There are many potential situations where not running a full inventory against a running machine results in a machine not being properly configured.

crdoconnor 3904 days ago

>And if you add a new machine to the cluster

Then you'll probably be running the whole playbook again, including those settings changes you made before.

>Or of a machine was temporarily offline when a tag was run

Then it'll fail and give you a warning. Shrug your shoulders and run it again?

dsmithatx 3904 days ago

With Tower I have dynamic inventories and variables. I can make it as descriptive as I need to and view everything in Tower.

justingood 3904 days ago

Ansible 2.0 should have some new strategies to speed things up, depending on your requirements: https://docs.ansible.com/ansible/playbooks_strategies.html It will be interesting to see how performance is after it's released.

We eventually settled on having Ansible build an AMI for us that can then be spun up by as part of a Cloudformation template (also initiated by Ansible).

We've actually been moving further and further away from having Ansible handle the configuration management side of things, and deal with Orchestration primarily.

swinglock 3904 days ago

What do you use for configuration management?

justingood 3904 days ago

We've moved to using Hashicorp's Consul-Template (https://github.com/hashicorp/consul-template). Ansible populates Consul with any required configuration changes during the deployment of a new version, and Consul-Template knows about these changes and automatically writes them to disk. Applications running on the host are then reloaded to pick up the changes.