Hacker News new | ask | show | jobs
by devit 3900 days ago
Every time I use some "configuration management" tool I wonder whether it's really better than just using shell.

Basically you lose a lot of time searching the web for how to do things that you already know how to do in shell, but the benefits are not so clear.

9 comments

I thought so too for a long time. Until that time when I upgraded the RAID10 on our database servers from a 4 drive to a 8 drive configuration (which requires rebuilding the whole array if you want the performance benefits). Getting the intricate configuration of the two machines (postgres streaming replication works, but has a lot of moving parts to keep in mind) back without having to remember any details was absolutely priceless.

Completely wiping and reinstalling the main database servers (one after another of course) during the day while the system was in active use and completing the process with zero user intervention, that felt amazing.

Since then, whenever I had to reinstall a machine for one reason or another, I always appreciated the immense speed-up I gained by not having to ever manually re-do the configuration.

Better yet: All the years of growing the configuration, all the small insights learned over time, all the small fixes to the configuration: All are preserved and readily available. Even better: By using git, I can even go back in time and learn why I did what and when.

"Why am I using TCP for NFS? Oh right - that was back in december of 2012 when we were using UDP and we ran into that kernel deadlock" - that's next to impossible to do when you're configuring servers manually.

I don't think devit challenges the "automate" part, only the "separate tool" part. In Ansible you specify a sequence of commands just like you do in a shell script.
Take the time to learn a tool like Ansible. It is not about replaying a simple sequence of commands (imperative). It is more about declaring what you want your system to look like, and letting the tool decide which pieces need to run based on the current state of the system.

It's like make vs a shell script. If you use scripts to build your programs, you either have to write your own checks to test whether every step is necessary or not (cumbersome, error prone, and quite complex) or you just script it to build from scratch every time (inefficient).

But for systems management, rebuilding from scratch can be worse than just inefficient. Imagine if your script reinstalled MySQL from scratch every time you ran it...

Most shell commands are actually "declarative" in a sense.

If you run "apt-get -y install foo" that means that you want "foo" to be installed. If it's already installed, it just does nothing.

In Ansible, you'd use "apt: name=foo state=present" which does exactly the same thing as the apt-get command, but requires a web search to figure out how to write (assuming you know normal Linux system usage but haven't memorized Ansible).

The only differences seem to be that Ansible tells you whether the command made a change or not, and that you can parse the Ansible configuration with an external tool (assuming there are no loops/variable/etc.), but both of these things don't really seem that useful in practice.

> "apt-get -y install foo" that means that you want "foo" to be installed. If it's already installed, it just does nothing.

not really. If you do that it means that you want to update foo to the latest version the system knows about.

And other commands fail if the thing they are supposed to to is already done. Like `adducer`. So you could still run it and assume a failure to mean "the user already exists" - but it could of course also mean: "the user didn't exist, but creating it failed".

Then you start to have a look at the exit code which may be different between the two cases.

But every command behaves differently, so you need to learn all of this.

With Ansible (or puppet), the syntax is always the same and the actually needed operations are abstracted away.

Again, my advice is to take some time and learn the tools before discussing their strengths and weaknesses.

You missed more Ansible strengths, like detecting changes and restarting only affected services for example. Show me the idempotent shell script which does apt-get to install some dependency, updates some configuration file, and then starts or restarts a service depending on both the current state of the service (was it already running?) and whether the apt-get or config file change actuallly modified things.

Then scale that up. A lot.

There is even more. If you care to learn it before dismissing it.

It's not that hard to make a shell script idempotent - I've done it quite a bit. You check for the artifacts of an install and branch based on the results.

I still use Salt instead of shell scripts, but that's mostly to have the authenticated/encrypted channels and only have to push my code to one place to run it globally.

Well, one of the main advantages of using a configuration management tool is that the configurations you're writing are actually repeatable, and these tools tend to provide you with a lot of modules that take this in regard for you. If you were to use pure shell, you'd have to take a lot of things in account just to take care of this aspect alone. Also, these tools provide abstractions that make it easier to execute things as a unit (such as adding a user and a number of things having to do with it) without having to think about all the details. Often, they can be used on multiple platforms in the same way, too. So yeah, I do think configuration management tools solve real problems.
there is always script:

    - script: /some/local/script.sh --some-arguments 1234
originally one of the selling points of ansible was that you could just include a shell script and run it.

I've also often found much of my time wasted trying to get ansible to do something simple.

Just being able to have your tool know the list of servers, and their roles makes it worth it.

I did a fair bit of work based on the OpenStack tripleO project, which suffered from the OpenStack NIH syndrome. They could not agree on a CM tool, and wrote it in bash. Never, ever, ever again. Trying to cluster RabbitMQ / Percona across 3 different machines, via bash is an abomination, whereas in Ansible / Salt etc. it is pretty easy :)

That's ok if you have a known good baseline configuration. In that case it's no different to say a Dockerfile.

However the config management stuff seems to come to light when you've got a mess on your hands and need to rationalise it and make it consistent.

I'm slightly leaning towards the "rebuild with known good baseline" state of affairs these days however even as a long time Ansible user. Rather than upgrade stuff, I build something new alongside and then do a switcheroo nearly every time.

One day, hopefully containers will allow us to have consistent state everywhere.

How many machines and how many types of machines do you deal with?
I've kept an eye on https://commando.io/ for the same reason.
I think that the big problem with shell is that it doesn't really offer the right abstractions for a lot of this: one doesn't (normally) want to run:

    if [ ! -d /opt/foothing ]
      then rm -f /opt/foothing && mkdir /opt/foothing
    fi
    cd /opt/foothing
    tar xf /tmp/instpkg.tar.gz
    sed -e s/QQQbarvalQQQ/$BAR_SETTING/ -i /opt/foothing/config
    …
Normally, one just wants to install & configure foothing. Abstracting that away in shell is possible but a pain: it doesn't really have a rich language for composing paths and other variable values; quoting is a right royal pain; by the time one's written a fully-working shell script (note that the snippet above has no error-handling, breaks if /opt doesn't exist, breaks if $BAR_SETTING contains whitespace and doesn't enable one to override the foothing installation location), it's nearly impossible to read & understand.

The Right Answer would involve a language which enabled one to create one's own syntactic abstractions in order to satisfy the general and specific needs of software installation. As an example, it'd be nice to have a WITH-INSTALLATION-DIRECTORY construct, which ensures that a directory exists, ensures that it's owned by the appropriate user, ensures that no other package already claims it (except that a previous version of the currently-being-installed package is okay), registers the directory and everything created in it during WITH-INSTALLATION-DIRECTORY as belonging to the currently-being-installed package, handles errors in a well-defined and useful manner for calling code, and so on and on and on.

And of course even that isn't high-level enough: If I'm installing bazit, which depends on foothing and quuxstuff, then I'll want to call something which ensures they exist. Or maybe there's an optional 'dependency,' and I want to do certain things if they exist and certain if not.

And maybe it's not low-level enough either. What if I want to override one particular sort of installation behaviour, but not the rest? What if I want to install a package in my own account, as myself? Wouldn't it be cool if I could set a few variables and the package manager Just Worked™?

As another user indicated, what all these tools really need is to be Lisp: versionable data which is code. As Shiver's work with scsh demonstrated, a Lisp-like language can be very pleasant to write POSIX applications in. Macros enable one to create useful syntactic constructs which make meaning, rather than details, clear. Dynamic variables (as in Common Lisp) easily enable customisation based on the call stack. CL's condition and restart systems are the gold standard for error signalling and recovery.

The benefits become very clear as soon as you need to manage more than 10 entities (instances, VMS, etc) in a consistent, reproducible, clean manner.